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Foreword 



rd , 



This Technical Specification has been produced by the 3 Generation Partnership Project (3 GPP). 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an 
identifying change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 or greater indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the document. 

The 3GPP transparent end-to-end packet- switched streaming service (PSS) specification consists of seven 3GPP TSs: 
3GPP TS 22.233 [1], 3GPP TS 26.233 [2], 3GPP TS 26.234 [3], 3GPP TS 26.245 [4], 3GPP TS 26.246 [5], 3GPP TS 
26.247 [49] and the present document. 

The TS 22.233 contains the service requirements for the PSS. The TS 26.233 provides an overview of the PSS. The TS 
26.234 provides the details of protocol and codecs used by the PSS. The TS 26.245 defines the Timed text format used 
by the PSS. The TS 26.246 defines the 3GPP SMIL language profile. The 3GPP TS 26.247 defines Progressive 
Download and Dynamic Adaptive Streaming over HTTP. The present document defines the 3GPP file format (3GP) 
used by the PPS and MMS services. 

The TS 26.244 (present document), TS 26.245 and TS 26.246 started with Release 6. EarHer releases of the 3GPP file 
format, the Timed text format and the 3GPP SMIL language profile can be found in TS 26.234. The 3GPP TS 26.247 
started with Release-10. Earlier releases of the Adaptive HTTP Streaming can be found in 3GPP TS 26.234. 



Introduction 



A file format contains data in a structured way. The 3 GPP file format can contain timing, structure and media data for 
multimedia streams. It is used by MMS, PSS and MBMS for timed visual and aural multimedia. 
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Scope 



The present document defines the 3GPP file format (3GP) as an instance of the ISO base media file format. The 
definition addresses 3GPP specific features such as codec registration and conformance within the MMS, PSS and 
MBMS services. 
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3 Definitions and abbreviations 

3.1 Definitions 

For the purposes of the present document, the following terms and definitions apply: 

continuous media: media with an inherent notion of time. In the present document speech, audio, video, timed text and 
DIMS 

discrete media: media that itself does not contain an element of time. In the present document all media not defined as 
continuous media 

PSS client: client for the 3GPP packet switched streaming service based on the IETF RTSP/SDP and/or HTTP 
standards, with possible additional 3 GPP requirements according to [3] 

PSS server: server for the 3GPP packet switched streaming service based on the IETF RTSP/SDP and/or HTTP 
standards, with possible additional 3 GPP requirements according to [3] 

3.2 Abbreviations 

For the purposes of the present document, the abbreviations given in 3GPP TR 21.905 [6] and the following apply. 

3GP 3GPP file format 

AAC Advanced Audio Coding 

AMR-WB+ Extended Adaptive Multi-Rate Wideband Codec 

AVC Advanced Video Coding 

ADU Application Data Unit 

BIFS Binary Format for Scenes 

DIMS Dynamic and Interactive Multimedia Scenes 

Enhanced aacPlus MPEG-4 High Efficiency AAC plus MPEG-4 Parametric StereoFLUTE File Delivery over 

Unidirectional Transport 

HTTP HyperText Transport Protocol 

ITU-T International Telecommunications Union - Telecommunications 

MIKEY Multimedia Internet KEYing 

MIME Multipurpose Internet Mail Extensions 

MMS Multimedia Messaging Service 

MP4 MPEG-4 file format 

MPD Media Presentation Description 

PSS Packet-switched Streaming Service 

RAP Random Access Point 

RTP Real-time Transport Protocol 
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RTSP 
SDP 
SRTP 
URL 



Real-Time Streaming Protocol 
Session Description Protocol 
Secure Real-time Transport Protocol 
Uniform Resource Locator 



Overview 



The 3GPP file format (3GP) is defined in this specification as an instance of the ISO base media file format [7]. 3GP is 
mandated in [8] to be used for continuous media along the entire delivery chain envisaged by the MMS, independent of 
whether the final delivery is done by streaming or download, thus enhancing interoperability. 

In particular, the following stages are considered: 

- upload from the originating terminal to the MMS proxy; 
file exchange between MMS servers; 

- transfer of the media content to the receiving terminal, either by file download or by streaming. In the first case 
the self-contained file is transferred, whereas in the second case the content is extracted from the file and 
streamed according to open payload formats. In this case, no trace of the file format remains in the content that 
goes on the wire/in the air. 

For the PSS, the 3 GPP file format is mandated in [3] to be used for timed text and it should be supported by PSS 
servers; 3GP files with streaming- server extensions should be used for storage in streaming servers and the "hint track" 
mechanism should be used for the preparation for streaming. For Adaptive HTTP Streaming, HTTP streaming 
extensions are defined. 



5.1 



Conformance 



General 



The 3GPP file format is structurally based on the ISO base media file format defined in [7]. However, the conformance 
statement for 3GP files is defined here by addressing constraints and extensions to the ISO base media file format, 
registration of codecs, file identification (file extension, brand identifier and MIME type) and profiles. If a 3GP file 
contains codecs or functionalities not conforming to this specification they may be ignored, i.e. a 3GP compliant file 
parser may ignore non-compliant boxes. 



5.2 



Definition 



5.2.1 Limitations to the ISO base media file format 

The following limitation to the ISO base media file format [7] shall apply to a 3GP file: 

- compact sample sizes ('stz2') shall not be used for tracks containing H.263, MPEG-4 video, AMR, AMR-WB, 
AAC or Timed text. 

NOTE: The extended presentation format (see clause 1 1) is defined by using the Meta box of the ISO base media 
file format [7] that was not present in the first edition. Hence, extended presentations in 3GP files are 
explicitly signalled via the Extended-presentation profile (see clause 5.4.6). 

5.2.2 Registration of codecs 

Code streams for H.263 video [9], MPEG-4 video [10], H.264 (AVC) video [29], AMR narrow-band speech [11], AMR 
wide-band speech [12], Extended AMR wide-band audio [21], Enhanced aacPlus audio [23, 24, 25], MPEG-4 AAC 
audio [13], and timed text [4] can be included in 3GP files as described in clause 6 of the present document. 
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5.2.3 Extensions 

The following extensions to the ISO base media file format [7] can be used in a 3GP file: 

- streaming- server extensions (see clause 7); 

- asset information (see clause 8); 

- video-buffer information (see clause 9); 

- AVC file format (see [20] [47]); 

- RTP and RTCP reception hint tracks (see [38]); 

- SRTP and SRTCP reception hint tracks with key management information for SRTP recordings (see [38] and 
clause 12). 

If SDP information is included in a 3GP file, it shall be used as defined by the streaming- server extensions. 



5.2.4 MPEG-4 systems specific elements 

For the storage of MPEG-4 media specific information in 3GP files, this specification refers to MP4 [14] and the AVC 
file format [20] [47], which are also based on the ISO base media file format. However, tracks relative to MPEG-4 
system architectural elements (e.g. BIFS scene description tracks or OD Object descriptors) are optional in 3GP files 
and shall be ignored. The inclusion of MPEG-4 media does not imply the usage of MPEG-4 systems architecture. 
Terminals and servers are not required to implement any of the specific MPEG-4 system architectural elements. 

5.2.5 Template fields 

The ISO base media file format [7] defines the concept of template fields that may be used by derived file formats. The 
template field 'alternate group' can be used in 3GP files, as defined in clause 7.2. No other template fields are used. 

5.2.6 Interpretation of the 3GPP file format 

All index numbers used in the 3 GPP file format start with the value one rather than zero, in particular 'first-chunk' in 
Sample to chunk box, 'sample-number' in Sync sample box and 'shadowed- sample-number', 'sync-sample-number' in 
Shadow sync sample box. 

5.3 Identification 

5.3.1 General 

3GP files can be identified using several mechanisms: file extension, MIME types and brands. 

5.3.2 File extension 

When stored in traditional computer file systems, 3GP files should be given the file extension '.3gp'. Readers should 
allow mixed case for the alphabetic characters. 

5.3.3 MIME types 

The MIME types 'video/3 gpp' (for visual or audio/visual content, where visual includes both video and timed text) and 
'audio/3 gpp' (for purely audio content) shall be used as defined in [27]. 

5.3.4 Brands 

This specification defines several brand identifiers corresponding to the profiles defined in clause 5.4. Brands are 
indicated in a file-type box, defined in [7], which shall be present in conforming files. The fields of the file-type box 
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shall be used as follows: 

Brand: Identifies the "best use" of the file and should match the file extension. For files with extension '.3gp' 
and conforming to this specification, the brand shall be one of the profile brands defined in clause 5.4. 

- Minor Version: This identifies the minor version of the brand. For files with brand '3gLZ', where L is a letter 
and Z a digit, and conforming to version Z.x.y of this specification, this field takes the value x*256 + y. 

- CompatibleB rands: a list of brand identifiers (to the end of the box). Any profile of a 3GP file is declared by 
including the corresponding brand from clause 5.4 in this list. 

The brand identifier (of one of the profiles) must occur in the compatible-brands list, and may also be the primary 
brand. Conformance to more than one profile is indicated by listing the corresponding brands in the compatible-brands 
list. If the file is also conformant to earlier releases of this specification, it is recommended that the corresponding 
brands ('3gp4', '3gp5', '3gp6', '3gp7' and/or '3gp8') also occur in the compatible-brands list. If, for instance, '3gp4' is not 
in the compatible-brands list, then the file will not be processed by a Release 4 reader. Readers should check the 
compatible-brands list for the identifiers they recognize, and not rely on the file having a particular primary brand, for 
maximum compatibility. Files may be compatible with more than one brand, and have a 'best use' other than this 
specification, yet still be compatible with this specification. 

5.4 Profiles 

5.4.1 General 

All 3GP files of this release shall conform to the general definitions in clauses 5.1-5.3. Additional profile-specific 
constraints are listed below. A 3GP file must conform to at least one profile and may conform to several profiles. 

5.4.2 General profile 

The 3GP General profile is branded "3gg9" and is a superset of all other profiles. It is used to identify 3GP files 
conformant to this specification, although they may not conform to any of the specific profiles listed below. 

NOTE: The General profile of 3GP has fewer restrictions than other profiles and is suitable for files not yet ready 
to be delivered by MMS or to be streamed by a PS S server. A General 3GP file may for instance contain 
several alternative tracks of media. After extracting a suitable set of tracks the file may be ready for MMS 
and can be re-profiled as a Basic file. Alternatively, by adding streaming-server extensions, it may be re- 
profiled as a Streaming-server profile. 

5.4.3 Basic profile 

The 3GP Basic profile is branded "3gp9". 

The following constraints shall apply to a 3GP file conforming to Basic profile: 

there shall be no references to external media outside the file, i.e. a file shall be self-contained; 

the maximum number of tracks shall be one for video (or alternatively one for scene description), one for audio 
and one for text; 

- the maximum number of sample entries shall be one per track for video and audio (but unrestricted for text and 
scene description); 

- there shall be no references between tracks, e.g., a scene description track shall not refer to a media track since 
all tracks are on equal footing and played in parallel by a conforming player. 

NOTE 1: The Basic profile of 3GP in Release 6 or higher corresponds to 3GP files of earlier releases, which did not 
define profiles. 

NOTE 2: In order to maintain backward compatibility with Release 4 and Release 5, it is not recommended to use 
movie fragments in 3GP files for MMS. 
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NOTE 3: For H.264 (AVC) video in a Basic profile 3GP file, the restriction on the number of video tracks implies 
in particular that there shall be no alternative tracks (including switching tracks) and no separate tracks 
for parameter sets. 

NOTE 4: For DIMS scene description in a Basic profile 3GP file, the restriction on the number of scene description 
tracks implies in particular that there shall be no separate tracks for redundant DIMS units. 

NOTE 5: The handler types for tracks with video, audio, text and scene description are "vide", "soun", "text", and 
"sdsm", respectively. 



5.4.4 Streaming-server profile 



The 3GP Streaming-server profile is branded "3gs9" and is used in PSS. Conformance to this profile will guarantee 
interoperability between content creation tools and streaming servers, in particular for the selection of alternative 
encodings of content and adaptation during streaming. 

The following constraints shall apply to 3GP files conforming to Streaming- server profile: 

RTP hint tracks shall be included for all media tracks; 

- RTP hint tracks shall comply with streaming as specified by PSS [3]; 

- SDP information shall be included, as specified in clause 7.5, where SDP fragments shall be stored in the hint 
tracks with media-level control URLs referring to (the same) hint tracks. 

- streaming- server extensions should be used for hint tracks, as defined in chapter 7. 

The following requirements shall apply to servers conforming to this profile. A conforming server 

- shall understand and respect directions given in the streaming- server extensions, as defined in chapter 7; 

- should understand hint tracks; 

- may override instructions in hint tracks. 

NOTE 1: The instructions given in RTP hint tracks shall be consistent with the PSS. In particular, sending times of 
RTP packets shall respect buffer constraints and be consistent with parameters used in SDP. 

NOTE 2: Earlier releases of the 3 GPP file format did not define streaming- server extensions or profiles. The usage 
of hint tracks was an internal implementation matter for servers outside the scope of the PSS 
specification. 

5.4.5 Progressive-download profile 

The 3GP Progressive-download profile is branded "3gr9". It is used to label 3GP files that are suitable for progressive 
download, i.e. a scenario where a file may be played during download (with some delay). 

The following constraints shall apply to 3GP files conforming to Progressive-download profile: 

- the "moov" box shall be placed right after the "ftyp" box in the beginning of the file; 

- all media tracks (if more than one) shall be interleaved with an interleaving depth of one second or less. 

NOTE 1 : This profile functions as an aid and not a requirement for progressive download, which has been an 

inherent feature of the 3 GPP file format since the first version in Release 4. By parsing a 3GP file, a client 
can always determine whether a file can be progressively downloaded, and then calculate the interleaving 
depth from the meta-data in the "moov" box. 

NOTE 2: The "interleaving depth of one second or less" means that: 

Each chunk contains one or more samples, with the total duration of the samples being either: no 

greater than 1 second, or the duration of a single sample if that sample" s duration is greater than 1 

second; 

Within a track, chunks must be in decoding time order within the media-data box "mdat"; 
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It is recommended that, in "mdat", regardless of media type, the chunks for all tracks are stored in 
ascending order by decoding time. However, this order may be perturbed so that, when two chunks 
from different tracks overlap in time, the chunk of one track (e.g. audio) is stored before the chunk of 
the other track (e.g. video), even if the first sample in the second track has a slightly earlier 
timestamp than the first sample in the first track. 

5.4.6 Extended-presentation profile 

The 3GP Extended-presentation profile is branded "3ge9". It enables a 3GP file to carry any kind of multimedia 
presentation composed of tracks, media files and a scene description. 

The following constraint shall apply to 3GP files conforming to Extended-presentation profile: 

- there shall be an extended presentation as defined in clause 1 1 . 

The following requirement shall apply to a player conforming to this profile. A conforming player 

- shall render the content of the 3GP file as prescribed by the contained scene description (primary item). 

NOTE: The scene description can address resources by using URLs as described in clause 1 1.3. In particular, it 
can refer to media in tracks and items and also to scene description updates in scene description tracks. 



5.4.7 Media Stream Recording profile 



The 3GP Media Stream Recording Profile is branded "3gt9". It is used to label 3GP files that contain recordings of 
multimedia streams, e.g., from a PSS or an MBMS session. 

The following constraints apply to 3GP files conforming to the Media Stream Recording Profile: 

Non-protected media streams may be contained in RTP reception hint tracks or in media tracks or in both as 
specified in [38] 

- One RTCP hint track per media stream may be contained as specified in [38]. 

- Protected media data may be contained in SRTP reception hint tracks as specified in [38]. 

Control information, i.e., SRTCP sender reports, necessary to render the protected media in SRTP reception hint 
tracks shall be contained in one SRTCP reception hint track per SRTP reception hint tracks specified in [38]. 

- MIKEY MBMS Traffic Key messages [39] necessary to access the information stored in SRTP and SRTCP 
reception hint tracks shall be contained in key message tracks as described in clause 12.2. 

- Key management information necessary to render the content of the 3GP file shall be contained as described in 
clause 12.2, provided that at least one SRTP reception hint track is present. 

- SDP information shall be included as specified in clause 12.3. 

The following requirements shall apply to 3GP players conforming to this profile. A conforming player: 
shall be able to reconstruct the received media stream from media tracks and RTP/RTCP hint tracks. 

- shall be able to extract the unprotected content from the 3GP file, provided that the player has access to required 
MBMS Service Keys or is able to obtain these using the methods specified in [39]. 

5.4.8 File-delivery server profile 

The File-delivery server profile is branded "3gf9". Conformance to this profile will guarantee interoperability between 
content creation tools and file delivery servers. 

The following constraints shall apply to 3GP files conforming to File-delivery server profile: 

File Delivery Hint Tracks and File Delivery Format Extensions, as specified in [7], shall be used for files 
intended for transmission over FLUTE [42]. 

The following requirements shall apply to servers conforming to this profile. 
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- A conforming server shall understand and respect Filed Delivery Hint Tracks and File Delivery Format 
Extensions, as specified in [7]. 

5.4.9 Adaptive-Streaming profile 

The 3GP Adaptive-Streaming profile is branded "3gh9". It is used to label 3GP files that are primarily suitable for 
adaptive file-based streaming. 

The following constraints shall apply to 3GP files conforming to Adaptive- Streaming profile: 

• the "moov" box shall be placed in the beginning of the file right after the "ftyp" box and a possibly present 
"pdin" box; 

• all movie data shall be contained in Movie Fragments, i.e. the tracks in the "moov" box shall not contain 
any samples (i.e. the entry_count in the 'stts', 'stsc', and 'stco' boxes shall be set to 0). 

• the "moov" box shall contain an "mvex" box to indicate the presence of movie fragments. 

• the "moov" box shall be followed by one or more "moof" and optionally "mdat" box pairs. 

• each "moof" box shall contain at least one track fragment. 

• The "moof" boxes shall use movie-fragment relative addressing for media data that does not use external 
data references and the flag "default-base-is-moof" shall also be set; absolute byte-offsets shall not be used 
for this media data. In a movie fragment, the durations by which each track extends should be as close to 
equal as practical. In particular, as movie fragments are accumulated, the track durations should remain 
close to each other and there should be no 'drift'. 

• For any track, any 'tfad' or any 'tfdt' box, if present, shall duplicate the operations of a possibly present 'elst' 
box; when any 'tfad' or any 'tfdt' is used, the 'elst' box, if present, shall be ignored. 

3 OF files conforming to this profile may contain: 

• segment ("styp") type boxes as specified in clause 13.2, 

• track fragment adjustment ("tfad") boxes as specified in clause 13.3, and 

• segment index ("sidx") boxes as specified in clause 13.4, 

• track fragment decode time ("tfdt") as specified in clause 13.5. 

If the "meta" box is present and contains the Media Presentation Description (MPD as defined in TS 26.234 [3]) then 
the "meta" box shall be contained within the "moov" box. In this case the "meta" box shall contain a 'hdlr' box with 
handler_type 'mpd ' followed by an 'xml ' box containing the MPD. 

If the 'meta' box is present and contains a link to the MPD, then the 'meta' box shall be contained within the 'moov' box. 
In this case the 'meta' box shall contain a 'hdlr' box with handler_type 'mpdl' followed by a 'dinf box. The 'dinf box 
shall contain a 'dref box with exactly one entry, which is a 'url ' box containing the URL of the MPD. 



5.4.10 Media Segment Profile 



The 3GP Media Segment profile is branded "3gm9". It is used to label segments conforming to this release. Media 
Segments are defined in 3GPP TS 26.247 [49]. 



5.5 File-branding guidelines 



The file-type brands defined in this specification are used to label 3GP files belonging to this release and conforming to 
one or more profiles. 3GP files may also conform to earlier Releases or even to other file formats, such as MP4, which 
is also derived from the ISO base media file format [7]. 

Table 5.1 contains a non-exhaustive list of examples with 3GP files for various purposes. Note, however, that it only 
gives typical or suggested uses. Both writers and readers of files should exercise care when using brand identifiers. It is 
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worth repeating the general guideUnes here, remembering that a brand identifies a specification or a conformance point 
in a specification; its presence in a file indicates both: 

- that the file conforms to the specification; it includes everything required by, and nothing contrary to the 
specification (though there may be other material); 

- that a reader implementing that specification (possibly only that specification) is given permission to read and 
interpret the file. 

All 3GP files of Release 5 or later shall contain the compatible brand "isom" indicating that they conform to the ISO 
base media file format, unless the reader is required to interpret extensions specific to the AVC file format [20], for 
which case the compatible brand "avcl" shall be used instead (see note 2), or extensions specific to extended 
presentations (see clause 11), for which case the compatible brand "iso2" shall be used (see note 3). The major brand 
shall be included in the compatible brands list as well. If a file contains more than one (3GPP) brand in the compatible 
brands list, the major brand indicates the 'best use' of the file. For example, a Release-5 file with audio combined with 
Timed text is best played by a Release-5 player, but may also be played by a Release-4 player that does not support 
timed text. 

NOTE 1: Since movie fragments are not allowed in Release 4 and Release 5, a fragmented 3GP file should not 

contain "3gp4" or "3gp5" as brand or compatible brand. A player that does not support movie fragments 
will only be able to play the first fragment of a fragmented file. 

NOTE 2: Consider the brands "isom" and "avcl". The first indicates conformance to the base structure of the ISO 
base media file format [7] . The second, conformance to the AVC-specific extensions (structures such as 
sample groups, for example) [20]. A file labelled as "isom" and "avcl" conformant is indicating that 
either these extensions are not present, or if present, they can be ignored (as an "isom" reader will not 
understand them). If the writer desires that only readers supporting the extensions read a file, then the 
"isom" brand would be omitted. These extensions are all optional (i.e. none are required to be in a file, 
though if they are, an "avcl "-conformant reader must interpret them), and therefore a file not using them 
is still "avcl" conformant. 

NOTE 3: The second version of the ISO base media file format defines the brand "iso2" that in addition to "isom" 
indicates conformance to extensions to the first version. 
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Table 5.1 : Examples of brand usage in 3GP files 



Conformance 


Suffix 


Brand 


Compatible brands 


Example content 


MMS and download: Files shall contain one or more of the brands 3gp4, 3gp5, 3gp6, 3gp7 and 3gp8. It is good 
practice to include compatible brands of earlier releases to enable legacy players to play the files. 


Release 4 


■3gp 


3gp4 


3gp4 


H.263 and AMR 


Release 5, 4 


.3gp 


3gp5 


3gp5, 3gp4, isom 


H.263 and AMR 


Release 6, 5, 4 


■3gp 


3gp6 


3gp6, 3gp5, 3gp4, isom 


H.263 and AMR 


Release 7, 6, 5, 4 


■3gp 


3gp7 


3gp7, 3gp6, 3gp5, 3gp4, isom 


H.263 and AMR 


Release 8, 7, 6, 5, 4 


■3gp 


3gp8 


3gp8, 3gp7, 3gp6, 3gp5, 3gp4, 
isom 


H.263 and AMR 


Release 6, 5, 4 


■3gp 


3gp6 


3gp6, 3gp5, 3gp4, isom 


H.263, AMR and Timed text 


Release 6, 5 


.3gp 


3gp6 


3gp6, 3gp5, isom 


Timed text 


Release 6 


■3gp 


3gp6 


3gp6, isom 


H.264 (AVC) Baseline profile and 
AMR 


Release 6 


■3gp 


3gp6 


3gp6, isom 


fragmented H.263 and AMR 


Release 7 


■3gp 


3gp7 


3gp7, isom 


DIMS and AMR 




Progressive download and MMS 


Release 6, 5, 4 


■3gp 


3gr6 


3gr6, 3gp6, 3gp5, 3gp4, isom 


H.263 


Release 6, 5, 4 


■3gp 


3gr6 


3gr6, 3gp6, 3gp5, 3gp4, isom 


interleaved H.263 and AMR 


Release 6 


■3gp 


3gr6 


3gr6, 3gp6, isom 


fragmented and interleaved H.263 and 
AMR 


Release 6 


■3gp 


3gr6 


3gr6, 3gp6, avc1 


interleaved H.264 (AVC) Baseline 
profile and AMR 




Streaming servers: Some files may in principle also be used for MMS or download. 


Release 6 


.3gp 


3gs6 


3gs6, isom 


AMR and hint track 


Release 6 


■3gp 


3gs6 


3gs6, isom 


2 tracks H.263 and 2 hint tracks 


Release 6, 5, 4 


■3gp 


3gs6 


3gs6, 3gp6, 3gp5, 3gp4, isom 


H.263, AMR and hint tracks 




Extended presentations: 


Release 7, 6 


.3gp 


3ge7 


3ge7, 3ge6, iso2 


SMIL, AMR and JPEG images 


Release 7 


■3gp 


3ge7 


3ge7, iso2 


DIMS, AMR, H.264 (AVC) Baseline 
profile and JPEG images 




General purpose: Files that are not yet suitable for MMS, download or PSS streaming servers. 


Release 6 


■3gp 


3gg6 


3gg6, isom 


4 tracks H.263 (and no hint tracks) 


Release 6 


■3gp 


3gg6 


3gg6, isom 


2 tracks H.263, 3 tracks AMR 




3GP file, also conforming to MP4 


Release 4, 5 and MP4 


.3gp 3gp5 3gp5, 3gp4, mp42, isom MPEG-4 video 




MP4 file, also conforming to 3GP 


Release 5 and MP4 


.mp4 


mp42 


mp42, 3gp5, isom 


MPEG-4 video and AAC 












Media Stream Recording file 


Release 8 


■3gp 


3gt8 


3gt8, isom 


SRTP reception hint and key message 
tracks 


Release 8 


■3gp 


3gt8 


3gt8, isom 


H.264 (AVC) Baseline profile and 
corresponding RTP reception hint 
track, reception hint track for AAC 


Release 9 


■3gp 


3gt9 


3gt9, isom 


H.264 (AVC) High Profile and 
corresponding RTP reception hint 
track, reception hint track for AAC 


Adaptive HTTP 
Streaming: 










Release 9 


■3gp 


3gh9 


3gp6, 3gp7, 3gp8, 3ge7, isom 


7 H.264 (AVC) tracks at different 
bitrates in one alternate track group, 3 
AAC tracks with different languages in 
one alternate group, no hint tracks, 
movie fragments 
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Codec registration 



6.1 



General 



The purpose of this clause is to define the necessary structure for integration of the H.263, MPEG-4 Visual, AMR, 
AMR-WB, Extended AMR-WB (AMR-WB+), Enhanced aacPlus and AAC media specific information in a 3GP file. 
Clause 6.2 gives some background information about the Sample Description box in the ISO base media file format [7] 
and clauses 6.3 and 6.4 about the MP4VisualSampleEntry box and the MP4AudioSampleEntry box in the MPEG-4 file 
format [14]. The definitions of the Sample Entry boxes for AMR, AMR-WB, AMR-WB+ and H.263 are given in 
clauses 6.5 to 6.10. The integration of timed text in a 3GP file is specified in [4], the integration of H. 264 (AVC) is 
specified in [20] [47] and the integration of DIMS is specified in [36] and clauses 5.4.3, 5.4.6 and 11 of the present 
document. 

AMR and AMR-WB data is stored in the stream according to the AMR and AMR-WB storage format for single 
channel header of Annex E [15], without the AMR magic numbers. 

The 3 GPP file format is the native storage format for AMR-WB +. The data stream, stored in samples of a 3GP file, 
shall be formatted according to clause 8.3 of [21]. Each sample contains one or more AMR-WB + storage units. The 
number of storage units per sample may differ from sample to sample. 



6.2 Sample Description box 



In an ISO file. Sample Description Box gives detailed information about the coding type used, and any initialisation 
information needed for that coding. The Sample Description Box can be found in the ISO file format Box Structure 
Hierarchy shown in figure 6.1. 



Movie Box 



Track Box 



Media Box 



Media Information Box 



Sample Table Box 



Sample Description Box 



Figure 6.1 : ISO File Format Box Structure Hierarchy 
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The Sample Description Box can have one or more Sample Entries. Valid Sample Entries already defined for ISO and 
MP4 include MP4AudioSampleEntry, MP4VisualSampleEntry and HintSampleEntry. The Sample Entries for AMR 
and AMR-WB shall be AMRSampleEntry, for AMR-WB+ it shall be AMRWPSampleEntry, for H.263 it shall be 
H263SampleEntry, for H.264 (AVC) it shall be AVCSampleEntry, for timed text it shall be TextSampleEntry, for 
DIMS it shall be DIMSSampleEntry, and for hint tracks it shall be HintSampleEntry. 

The format of SampleEntry and its fields are explained as follows: 

SampleEntry ::= MP4VisualSampleEntry I 
MP4AudioSampleEntry I 
AMRSampleEntry I 
AMRWPSampleEntry I 
H263SampleEntry I 
AVCSampleEntry I 
TextSampleEntry I 
DIMSSampleEntry I 
HintSampleEntry 

Table 6.1 : SampleEntry fields 



Field 


Type 


Details 


Value 


MP4VisualSampleEntry 




Entry type for visual samples defined 
in the MP4 specification. 




MP4AudioSampleEntry 




Entry type for audio samples defined 
in the MP4 specification. 




AMRSampleEntry 




Entry type for AMR and AMR-WB 
speech samples defined in clause 6.5 
of the present document. 




AMRWPSampleEntry 




Entry type for AMR-WB+ audio 
samples defined in clause 6.9 of the 
present document. 




H263SampleEntry 




Entry type for H.263 visual samples 
defined in clause 6.6 of the present 
document. 




AVCSampleEntry 




Entry type for H.264 (AVC) visual 
samples defined in the AVC file 
format specification. 




TextSampleEntry 




Entry type for timed text samples 
defined in the timed text specification 




DIMSSampleEntry 




Entry type for DIMS scene description 
samples defined in the DIMS 
specification. 




HintSampleEntry 




Entry type for hint track samples 
defined in the ISO specification. 





From the above 9 Sample Entries, only the MP4VisualSampleEntry, MP4AudioSampleEntry, H263SampleEntry, 
AMRSampleEntry and AMRWPSampleEntry are taken into consideration here. TextSampleEntry is defined in [4], 
HintSampleEntry in [7], AVCSampleEntry in [20], and DIMSSampleEntry in [36]. 



6.3 MP4VisualSampleEntry box 

The MP4 Visuals ampleEntry Box is defined as follows: 



MP4VisualSampleEntry 



:= BoxHeader 

Reserved_6 

Data-reference-index 

Reserved_16 

Width 

Height 

Reserved_4 

Reserved_4 

Reserved_4 

Reserved 2 
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Reserved_32 
Reserved_2 
Reserved_2 
ESDBox 



Table 6.2: MP4VisualSampleEntry fields 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'mp4v' 


Reserved 6 


Unsigned int(8) [6] 







Data-reference-index 


Unsigned int(16) 


Index to a data reference that to 
use to retrieve the sample data. 
Data references are stored in data 
reference boxes. 




Reserved_1 6 


Const unsigned 
int(32) [41 







Width 


Unsigned int(16) 


IVIaximum width, in pixels of the 
stream 




Height 


Unsigned int(16) 


Maximum height, in pixels of the 
stream 




Reserved_4 


Const unsigned 
int(32) 




0x00480000 


Reserved_4 


Const unsigned 
int(32) 




0x00480000 


Reserved_4 


Const unsigned 
int(32) 







Reserved_2 


Const unsigned 
int(16) 




1 


Reserved_32 


Const unsigned 
int(8) [32] 







Reserved_2 


Const unsigned 
int(16) 




24 


Reserved 2 


Constint(16) 




-1 


ESDBox 




Box containing an elementary 
stream descriptor for this stream. 





The stream type specific information is in the ESDBox structure, as defined in [14]. 

This version of the MP4VisualSampleEntry, with expHcit width and height, shall be used for MPEG-4 video streams 
conformant to this specification. 

NOTE: width and height parameters together may be used to allocate the necessary memory in the playback 
device without need to analyse the video stream. 

6.4 MP4AudioSampleEntry box 

MP4AudioSampleEntryBox is defined as follows: 

MP4AudioSampleEntry ::= BoxHeader 

Reserved_6 

Data-reference-index 

Reserved_8 

Reserved_2 

Reserved_2 

Reserved_4 

TimeScale 

Reserved_2 

ESDBox 
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Table 6.3: MP4AudioSampleEntry fields 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'mp4a' 


Reserved 6 


Unsigned int(8) [6] 







Data-reference-index 


Unsigned int(16) 


Index to a data reference that to 
use to retrieve the sample data. 
Data references are stored in data 
reference boxes. 




Reserved_8 


Const unsigned 
int(32) [2] 







Reserved_2 


Const unsigned 
int(16) 




2 


Reserved_2 


Const unsigned 
int(16) 




16 


Reserved_4 


Const unsigned 
int(32) 







TimeScale 


Unsigned int(16) 


Copied from track 




Reserved_2 


Const unsigned 
int(16) 







ESDBox 




Box containing an elementary 
stream descriptor for this stream. 





The stream type specific information is in the ESDBox structure, as defined in [14]. Enhanced aacPlus stored in .3GP 
files shall not use implicit signalling (as defined in [13]). 



6.5 AMRSampleEntry box 



For narrow-band AMR, the box type of the AMRSampleEntry Box shall be 'samr'. For AMR wideband (AMR-WB), 
the box type of the AMRSampleEntry Box shall be 'sawb'. 

The AMRSampleEntry Box is defined as follows: 

AMRSampleEntry ::= BoxHeader 

Reserved_6 

Data-reference-index 

Reserved_8 

Reserved_2 

Reserved_2 

Reserved_4 

TimeScale 

Reserved_2 

AMRSpecificBox 
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Table 6.4: AMRSampleEntry fields 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'samr' or "sawb" 


Reserved 6 


Unsigned int(8) [6] 







Data-reference-index 


Unsigned int(16) 


Index to a data reference that to 
use to retrieve the sample data. 
Data references are stored in data 
reference boxes. 




Reserved_8 


Const unsigned 
int(32) [2] 







Reserved_2 


Const unsigned 
int(16) 




2 


Reserved_2 


Const unsigned 
int(16) 




16 


Reserved_4 


Const unsigned 
int(32) 







TimeScale 


Unsigned int(16) 


Copied from media header box of 
this media 




Reserved_2 


Const unsigned 
int(16) 







AMRSpecificBox 




Information specific to the decoder. 





If one compares the MP4AudioSampleEntry Box - AMRSampleEntry Box the main difference is in the replacement of 
the ESDBox, which is specific to MPEG-4 systems, with a box suitable for AMR and AMR-WB. The 
AMRSpecificBox field structure is described in clause 6.7. 



6.6 H263SampleEntry box 

The box type of the H263SampleEntry Box shall be 's263'. 
The H263SampleEntry Box is defined as follows: 



H263SampleEntry ::= 



BoxHeader 

Reserved_6 

Data-reference-index 

Reserved_16 

Width 

Height 

Reserved_4 

Reserved_4 

Reserved_4 

Reserved_2 

Reserved_32 

Reserved_2 

Reserved_2 

H263SpecificBox 
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Table 6.5: H263SampleEntry fields 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




's263' 


Reserved 6 


Unsigned int(8) [6] 







Data-reference-index 


Unsigned int(16) 


Index to a data reference that to 
use to retrieve the sample data. 
Data references are stored in data 
reference boxes. 




Reserved_1 6 


Const unsigned 
int(32) [4] 







Width 


Unsigned int(16) 


IVIaximum width, in pixels of the 
stream 




Height 


Unsigned int(16) 


Maximum height, in pixels of the 
stream 




Reserved_4 


Const unsigned 
int(32) 




0x00480000 


Reserved_4 


Const unsigned 
int(32) 




0x00480000 


Reserved_4 


Const unsigned 
int(32) 







Reserved_2 


Const unsigned 
int(16) 




1 


Reserved_32 


Const unsigned 
int(8) [32] 







Reserved_2 


Const unsigned 
int(16) 




24 


Reserved 2 


Constint(16) 




-1 


H263SpecificBox 




Information specific to the H.263 
decoder. 





If one compares the MP4VisualSampleEntry - H263SampleEntry Box the main difference is in the replacement of the 
ESDBox, which is specific to MPEG-4 systems, with a box suitable for H.263. The H263SpecificBox field structure for 
H.263 is described in clause 6.8. 

6.7 AMRSpecificBox field for AMRSampleEntry box 

The AMRSpecificBox fields for AMR and AMR-WB shall be as defined in table 6.6. The AMRSpecificBox for the 
AMRSampleEntry Box shall always be included if the 3GP file contains AMR or AMR-WB media. 

Table 6.6: The AMRSpecificBox fields for AMRSampleEntry 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"damr" 


DecSpecificlnfo 


AlVIRDecSpecStruc 


Structure which holds the AIVIR 
and AIVIR-WB Specific information 





BoxHeader Size and Type: indicate the size and type of the AMR decoder-specific box. The type must be "damr" 
DecSpecificlnfo: the structure where the AMR and AMR-WB stream specific information resides. 
The AMRDecSpecStruc is defined as follows: 

struct AMRDecSpecStruc { 

Unsigned int (32) vendor 

Unsigned int (8) decoder_version 

Unsigned int ( 1 6) mode_set 

Unsigned int (8) mode_change_period 

Unsigned int (8) frames_per_sample 
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The definitions of AMRDecSpecStmc members are as follows: 

vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor field gives information about 
the vendor whose codec is used to create the encoded data. It is an informative field, which may be used by the 
decoding end. If a manufacturer already has a four-character code, it is recommended that it uses the same code in this 
field. Else, it is recommended that the manufacturer creates a four character code which best addresses the 
manufacturer" s name. It can be safely ignored. 

decoder_version: version of the vendor" s decoder which can decode the encoded stream in the best (i.e. optimal) way. 
This field is closely tied to the vendor field. It may give advantage to the vendor which has optimal encoder-decoder 
version pairs. The value is set to if decoder version has no importance for the vendor. It can be safely ignored. 

mode_set: the active codec modes. Each bit of the mode_set parameter corresponds to one mode. The bit index of the 
mode is calculated according to the 4 bit FT field of the AMR or AMR-WB frame structure. The mode_set bit structure 
is as follows: (B15xxxxxxB8B7xxxxxxB0) where BO (Least Significant Bit) corresponds to Mode 0, and B8 
corresponds to Mode 8. 

The mapping of existing AMR modes to FT is given in table 1. a in [16]. A value of 0x8 IFF means all modes and 
comfort noise frames are possibly present in an AMR stream. 

The mapping of existing AMR-WB modes to FT is given in Table l.a in TS 26.201 [17]. A value of Ox83FF means all 
modes and comfort noise frames are possibly present in an AMR-WB stream. 

As an example, if mode_set = 00000001 10010101b, only Modes 0, 2, 4, 7 and 8 are present in the stream. 

mode_change_period: defines a number N, which restricts the mode changes only at a multiple of N frames. If no 
restriction is applied, this value should be set to 0. If mode_change_period is not 0, the following restrictions apply to it 
according to the frames_per_sample field: 

if (mode_change_period < frames_per_s ample) 

frame s^er_s ample = kx (mode _change period) 
else if (mode _change period > frame s^er_s ample) 

mode_change_period = kx (frame s_per_s ample) 

where k : integer [2, ...] 

If mode_change_period is equal to frames_per_sample, then the mode is the same for all frames inside one sample. 

frames_per_sample: defines the number of frames to be considered as 'one sample' inside the 3GP file. This number 
shall be greater than and less than 16. A value of 1 means each frame is treated as one sample. A value of 10 means 
that 10 frames (of duration 20 msec each) are put together and treated as one sample. It must be noted that, in this case, 
one sample duration is 20 (msec/frame) x 10 (frame) = 200 msec. For the last sample of the stream, the number of 
frames can be smaller than frames_per_sample, if the number of remaining frames is smaller than frames_per_sample. 

NOTEl : The "hinter", for the creation of the hint tracks, can use the information given by the AMRDecSpecStmc 
members. 

N0TE2: The following AMR MIME parameters are not relevant to PSS: {mode_set, mode_change_period, 

mode_change_neighbor}. PSS servers should not send these parameters in SDP, and PSS clients shall 
ignore these parameters if received. 

6.8 H263SpecificBox field for H263SampleEntry box 

The H263SpecificBox fields for H. 263 shall be as defined in table 6.7. The H263SpecificBox for the 
H263SampleEntry Box shall always be included if the 3GP file contains H.263 media. 

The H263SpecificBox for H263 is composed of the following fields. 
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Table 6.7: The H263SpecificBox fields H263SampleEntry 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"d263" 


DecSpecificlnfo 


H263DecSpecStruc 


Structure which holds the H.263 
Specific information 




BitrateBox 




Specific bitrate information 
(optional) 





BoxHeader Size and Type: indicate the size and type of the H.263 decoder-specific box. The type must be "d263". 
DecSpecificlnfo: This is the structure where the H263 stream specific information resides. 
H263DecSpecStruc is defined as follows: 



struct H263DecSpecStruc{ 



} 



Unsigned int (32) 
Unsigned int (8) 
Unsigned int (8) 
Unsigned int (8) 



vendor 

decoder_version 
H263_Level 
H263_Profile 



The definitions of H263DecSpecStruc members are as follows: 

vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor field gives information about 
the vendor whose codec is used to create the encoded data. It is an informative field which may be used by the decoding 
end. If a manufacturer already has a four-character code, it is recommended that it uses the same code in this field. Else, 
it is recommended that the manufacturer creates a four character code which best addresses the manufacturer" s name. It 
can be safely ignored. 

decoder_version: version of the vendor" s decoder which can decode the encoded stream in the best (i.e. optimal) way. 
This field is closely tied to the vendor field. It may give advantage to the vendor which has optimal encoder-decoder 
version pairs. . The value is set to if decoder version has no importance for the vendor. It can be safely ignored. 

H263_Level and H263_Profile: These two parameters define which H263 profile and level is used. These parameters 
are based on the MIME media type video/H263-2000. The profile and level specifications can be found in [9]. 

EXAMPLE 1: H.263 Baseline = {H263_Level = 10, H263_Profile = 0} 

EXAMPLE 2: H.263 Profile 3 @ Level 10 = {H263_Level = 10 , H263_Profile = 3} 

NOTE: The "hinter", for the creation of the hint tracks, can use the information given by the H263DecSpecStruc 
members. 

The BitrateBox field shall be as defined in table 6.8. The BitrateBox may be included if the 3GP file contains H.263 
media. 

The BitrateBox is composed of the following fields. 

Table 6.8: The BitrateBox fields 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"bitr" 


DecBitratelnfo 


DecBitrStruc 


Structure which holds the Bitrate 
information 





BoxHeader Size and Type: indicate the size and type of the bitrate box. The type must be "bitr". 
DecBitratelnfo: This is the structure where the stream bitrate information resides. 
DecBitrStruc is defined as follows: 
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struct DecBitrStruc{ 



Unsigned int (32) Avg_Bitrate 
Unsigned int (32) Max_Bitrate 



The definitions of DecBitrStruc members are as follows: 

Avg_Bitrate: the average bitrate in bits per second of this elementary stream. For streams with variable bitrate this 
value shall be set to zero. 

Max_Bitrate: the maximum bitrate in bits per second of this elementary stream in any time window of one second 
duration. 

6.9 AMRWPSampleEntry box 

The box type of the AMRWPSampleEntry Box shall be 'sawp'. 
The AMRWPSampleEntry Box is defined as follows: 

AMRWPSampleEntry ::= BoxHeader 

Reserved_6 

Data-reference-index 

Reserved_8 

Reserved_2 

Reserved_2 

Reserved_4 

TimeScale 

Reserved_2 

AMRWPSpecificBox 

Table 6.9: AMRWPSampleEntry fields 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"sawp" 


Reserved 6 


Unsigned int(8) [6] 







Data-reference-index 


Unsigned int(16) 


Index to a data reference that to 
use to retrieve the sample data. 
Data references are stored in data 
reference boxes. 




Reserved_8 


Const unsigned 
int(32) [2] 







Reserved_2 


Const unsigned 
int(16) 




2 


Reserved_2 


Const unsigned 
int(16) 




16 


Reserved_4 


Const unsigned 
int(32) 







Sampling rate 


Unsigned int(16) 


See note 3. 




Reserved_2 


Const unsigned 
int(16) 







AMRWPSpecificBox 




Information specific to the AMR- 
WB+ decoder. 





If one compares the MP4AudioSampleEntry Box - AMRWPSampleEntry Box the main difference is in the replacement 
of the ESDBox, which is specific to MPEG-4 systems, with a box suitable for AMR-WB-F. The AMRWPSpecificBox 
field structure is described in clause 6.10. 

NOTE 1: In order to maintain backward compatibility with Release 4 and 5, the AMRWPSampleEntry should not 
be used for AMR-WB-f streams that only contain AMR-WB modes. Such streams should be stored as 
AMR-WB, i.e. by using the AMRSampleEntry with box type 'sawb', defined in clause 6.5, and the 
storage format for single channel header of Annex E [15], without the AMR magic numbers. This way 
file readers of previous releases will always be able to read AMR-WB streams stored in 3GP files. 
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NOTE 2: In order to enhance interoperability in Release 6, file readers capable of parsing tracks with AMR-WB+ 
should also be capable of parsing AMR-WB tracks (see note 1). 

NOTE 3: The timescale of AMR-WB+ is fixed to 72kHz to accommodate the internal sampling rate which may 
vary over time. The sampling rate field of the AMRWPSampleEntry is therefore not coupled to the 
timescale, but contains the recommended playback sampling rate. 



6.1 AMRWPSpecificBox field for AMRWPSampleEntry box 

The AMRWPSpecificBox fields for AMR-WB+ shall be as defined in table 6.10. The AMRWPSpecificBox for the 
AMRWPSampleEntry Box shall always be included if the 3GP file contains AMR-WB+ media. 

Table 6.10: The AMRWPSpecificBox fields for AMRWPSampleEntry 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"dawp" 


DecSpecificlnfo 


AlVIRWPDecSpecStruc 


Structure which holds the AIVIR- 
WB+ Specific information 





BoxHeader Size and Type: indicate the size and type of the AMR-WB+ decoder-specific box. The type must be 
"dawp". 

DecSpecificlnfo: the structure where the AMR-WB + stream specific information resides. 

The AMRWPDecSpecStruc is defined as follows: 

struct AMRWPDecSpecStruc { 

Unsigned int (32) vendor 
Unsigned int (8) decoder_version 

} 

The definitions of AMRWPDecSpecStruc members are as follows: 

vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor field gives information about 
the vendor whose codec is used to create the encoded data. It is an informative field, which may be used by the 
decoding end. If a manufacturer already has a four-character code, it is recommended that it uses the same code in this 
field. Else, it is recommended that the manufacturer creates a four character code which best addresses the 
manufacturer" s name. It can be safely ignored. 

decoder_ version: version of the vendor" s decoder which can decode the encoded stream in the best (i.e. optimal) way. 
This field is closely tied to the vendor field. It may give advantage to the vendor which has optimal encoder-decoder 
version pairs. The value is set to if decoder version has no importance for the vendor. It can be safely ignored. 

NOTE: For AMR and AMR-WB the AMRSpecificBox defines the number of frames that are stored in a sample. 
For AMR-WB +, however, the AMRWPSpecificBox does not specify an overall sample structure, as the 
number of storage units per sample may differ from sample to sample. 



Streaming-server extensions 



7.1 General 

This clause defines extensions to 3GP files to be used by streaming servers. The extensions enable a PSS server to relate 
different tracks and use them for selection and adaptation. In particular, they enable a PSS server to 

- generate SDP descriptions with alternatives, as specified in subclauses 5.3.3.3 - 5.3.3.4 of [3]; 

- select and combine tracks with alternative encodings of media before a presentation; 
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- switch between tracks with alternative encodings during a streaming session; 

- determine the decoding order, playout timestamp, and size for any ADU in an RTP payload. 
In addition, the streaming server extensions enable a PSS server to 

use SRTP hint tracks for integrity protection. 

The streaming-server extensions are intended to be used with hint tracks, although they are not limited to be used with 
hint tracks. Hint tracks are defined in the ISO base media file format [7] and provide (RTP) packetization instructions 
for media stored in a file. 

NOTE: The present document defines syntax and semantics for streaming- server extensions in 3GP files. It does 
not define protocols for, e.g., how a PSS server signals alternative encodings or switches between 
different bitrate encodings. All protocols used by a PSS server are defined in [3]. 



7.2 Groupings of alternative tracks 



By default all enabled tracks in a 3GP file are streamed (played) simultaneously. However, the ISO base media file 
format [7] specifies that tracks that are alternatives to each other can be grouped into an alternate group. Tracks in an 
alternate group that can be used for switching can be further grouped into a switch group, as defined here. 



7.2.1 



Alternate group 



Alternate group is identified by an integer, altemate_group, in the Track Header box of each track. If this integer is 
(default value), there is no information on possible relations to other tracks. If this integer is not 0, it should be the same 
for tracks that contain alternate data for one another and different for tracks belonging to different such groups. Only 
one track within an alternate group should be streamed or played at any time and must be distinguishable from other 
tracks in the group via attributes such as bitrate, codec, language, packet size etc. 



7.2.2 Switch group 



Switch group is identified by an integer, switch_group, in the Track Selection box of each track, as defined below. If 
this box is absent or if this integer is (default value), there is no information on whether the track can be used for 
switching during streaming or playing. If this integer is not 0, it shall be the same for tracks that can be used for 
switching between each other. Tracks that belong to the same switch group shall belong to the same alternate group. 



7.3 



Track Selection box 



This subclause defines an optional box that aids the selection between tracks. It is used to encode switch groups and the 
criteria that should be used to differentiate tracks within alternate and switch groups. 

The Track Selection box is defined in table 7.1. It is contained in the User data box of the track it modifies. 

Note that Track Selection box is also defined in [7], with a slightly different set of defined attributes. One difference is 
that herein the definition of the attribute "Language" identified by 'lang' is included; while in [7] the definition of the 
attribute "Media language" identified by 'mela' is included. 

Table 7.1 : Track Selection box fields 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"tsel" 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







SwitchGroup 


int(32) 


Switch group of track. 


(default) 


AttributeList 


Unsigned int(32) [N] 


List of N attributes to the end of 
the box. 





BoxHeader Size, Type, Version and Flags: indicate the size, type, version and flags of the Track Selection box. The 
type shall be "tsel" and the version shall be 0. No flags are defined. 
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SwitchGroup: indicates switch group as defined in clause 7.2.2. It shall be if the track is not intended for switching. 

AttributeList: is a list of attributes to the end of the box. The attributes in this list should be used as differentiation 
criteria for tracks in the same alternate or switch group. Each attribute is associated with a pointer to the field or 
information that distinguishes the track. Attributes and pointers are listed in table 7.2. 

Table 7.2: Attributes for AttributeList of the Track Selection box 



Name 


Attribute 


Pointer 


Language 


"lang" 


Value of grouping type LANG of 'alt-group' attribute in 
session-level SDP (defined in clause 5.3.3.4 of [3]) 


Bandwidth 


"bwas" 


Value of 'b=AS' attribute in media-level SDP 


Codec 


"cdec" 


SampleEntry (in Sample Description box of media track) 


Screen size 


"scsz" 


Width and height fields of MP4VisualSampleEntry and 
H263SampleEntry (in media track) 


IVIax packet size 


"mpsz" 


Maxpacketsize field in RTPHintSampleEntry 


IVIedia type 


"mtyp" 


Handlertype in Handler box (of media track) 



7.4 Combining alternative tracks 



Tracks from different alternate groups are streamed (played) simultaneously. However, all combinations of tracks may 
not form suitable presentations. In order to suggest suitable combinations of tracks and also to reduce the number of 
possible combinations, a content provider can encode preferred combinations of alternative tracks in a 3GP file. Such 
combinations are encoded by the 'alt-group' attribute in the session-level SDP fragment, as described in clause 7.5.3. 

If information on suitable combinations of tracks is missing, tracks with the lowest track IDs of each alternate group 
should be streamed (played) by default. 



7.5 



SDP 



7.5.1 Session- and media-level SDP 

Fragments that together constitute an SDP description shall be contained in a 3GP file with streaming-server extensions. 
Session-level SDP, i.e. all lines before the first media-specific line ('m=' line), shall be stored as Movie SDP information 
within the User Data box, as specified in [7]. Media-level SDP, i.e. an 'm=' line and the lines before the next 'm=' line 
(or end of SDP) shall be stored as Track SDP information within the User data box of the corresponding track. Media- 
level SDP shall be contained in hint tracks (if provided). 

7.5.2 Stored versus generated SDP fields 

The SDP information stored in a 3GP file should be as complete as possible, although some fields must be generated or 
modified by the server when a presentation is composed. Table 7.3 gives an overview of the SDP fields used by PSS, 
c.f. Table A.l in [3], and whether they are required to be included in 3GP files or whether the server is required to 
generate them. 
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Table 7.3: Overview of stored and generated fields in SDP 



Type 


Description 


Contained in 
3GP file 


Generated by 
PSS server 


Session Description 


V 


Protocol version 


R 








Owner/creator and session identifier 





R 


S 


Session Name 


R 





1 


Session information 








U 


URI of description 








E 


Email address 








P 


Phone number 








C 


Connection Information 





R 


B 


Bandwidth 
information 


AS 





(see note 7) 


RS 








RR 








TIAS 








One or more Time Descriptions (See below) 


Z 


Time zone adjustments 








K 


Encryption key 








A 


Session attributes 


control 





R 


range 


R 





alt-group 


R (see note 4) 





QoE-Metrics 








3GPP-Asset-lnformation 








3GPP-lntegrity-Key 


N 


R (see note 6) 


3GPP-SDP-Auth 


N 


R (see note 6) 


maxprate 








One or more IVIedia Descriptions (See below) 




Time Description 


T 


Time the session is active 


R 













R 


Repeat times 










Media Description 


M 


Media name and transport address 


R 





1 


Media title 








C 


Connection information 





R 


B 


Bandwidth 
information 


AS 


R 


(see note 7) 


RS 





R 


RR 





R 


TIAS 


R 





K 


Encryption Key 








A 


Attribute Lines 


control 





R 


range 


R 





fmtp 


R 





rtpmap 


R 





X-predecbufsize 


R (see note 5) 





X-initpredecbufperiod 


R (see note 5) 





X-initpostdecbufperiod 


R (see note 5) 





X-decbyterate 


R (see note 5) 





framesize 


R 





alt 


N 


R 


alt-default-id 


N 


R 


SGPP-Adaptation-Support 


N 





QoE-Metrics 








3GPP-Asset-lnformation 








3GPP-SRTP-Config 


N 


R (see note 6) 


rtcp-fb 


N 


R 


maxprate 


R 
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Note 1 : Fields in 3GP files are Required (R), Optional (O), or Not allowed (N). 

Note 2: Servers are Required (R) to generate (possibly by copying or modifying from file), or have the 
Option (O) to generate/copy/modify, or are Not allowed (N) to modify fields. If a field is 
present in a file, it shall be copied or modified, but not omitted, by the server. 

Note 3: Some types shall only be included under certain conditions, as specified by PSS [3]. 

Note 4: The 'alt-group' attribute is required to be stored in 3GP files if it is used. 

Note 5: The "X-" attributes are required to be stored in 3GP files if they are used. They may either be 
specified in the PSS Annex G box '3gag' (see Clause 9) or in media-level SDP fragments. 

Note 6: The server is required to generate the "3GPP-lntegrity-Key", "3GPP-SDP-Auth", and "3GPP- 
SRTP-Config" attributes if integrity protection is used. 

Note 7: The "b=AS" session bandwidth shall include UDP/IP overhead. The value shall be based on 

IPv4 when stored in a file, but may be modified by the server to accommodate for IPv6. The 
"maxprate" attribute is useful for such a conversion. 



7.5.3 SDP attributes for alternatives 

Clauses 5.3.3.3 and 5.3.3.4 of [3] define SDP attributes that a server can use for presenting options to a client. These 
attributes can be used to encode suggested groupings of tracks, e.g. for selecting a certain language or target bitrate. 

Suggested groupings of tracks from different alternate groups, i.e. groupings of tracks that should be streamed together, 
are encoded by using the 'alt-group' attribute in the session-level SDP. Note that a server may have to prune options 
from such groupings if certain tracks are not presented to the client. 

Media-level SDP fragments shall not contain alternative-media attributes ('alt' and 'alt-default-id') as they are difficult to 
pre-encode. When the server combines several media-level SDP fragments from alternative tracks into one media-level 
SDP, it must generate the appropriate 'alt' and 'alt-default-id' attributes. This can be done by using the information 
provided in the 'alt-group' attributes in the session-level SDP. 

NOTE 1 : Track IDs given by the Track Header boxes shall be used for alternative IDs ('alt-id') in attributes for SDP 
alternatives. 

NOTE 2: Tracks with the lowest track IDs of each alternate group should be used as default tracks, i.e. used with 
the 'alt-default-id' attributes. 

7.6 SRTP 

Hinted content may require the use of SRTP [19] for streaming, e.g. for integrity protection, by using the hint-track 
format for SRTP defined here. It consists of a dedicated sample entry, which will be ignored by 3GP servers not capable 
of handling SRTP. 

SRTP hint tracks are formatted identically to RTP hint tracks defined in [7], except that: 

- the sample entry name is changed from 'rtp ' to 'srtp' to indicate to the server that SRTP is required; 

- an extra box is added to the sample entry which can be used to instruct the server in the nature of the on-the-fly 
encryption and integrity protection that must be applied. 

Samples of an SRTP hint track follow the same syntax for constructing RTP packets as RTP hint tracks. 

An SRTP Hint Sample Entry ('srtp') shall include an SRTP Process Box ('srpp') that may instruct the server as to which 
SRTP algorithms should be applied. It is defined in [7] and included in Table 7.4 for information. 
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Table 7.4: SRTPProcessBox 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"srpp" 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







EncryptionAlgorithmRTP 


Unsigned int(32) 


4cc identifying the algorithm 




EncryptionAlgorithmRTCP 


Unsigned int(32) 


4cc identifying the algorithm 




IntegrityAlgorithmRTP 


Unsigned int(32) 


4cc identifying the algorithm 




IntegrityAlgorithmRTCP 


Unsigned int(32) 


4cc identifying the algorithm 




SchemeTypeBox 




Box containing the protection 
scheme. 




SchemelnformationBox 




Box containing the scheme 
information. 





The SchemeTypeBox and SchemelnformationBox have the syntax defined in Tables 10.7 and 10.8, respectively. 
They serve to provide the parameters required for applying SRTP. The Scheme Type Box is used to indicate the 
necessary key management and security policy for the stream in extension to the defined algorithmic pointers provided 
by the SRTP Process Box. The key management functionality is also used to establish all the necessary SRTP 
parameters. The key management functionality is also used to establish all the necessary SRTP parameters as listed in 
section 8.2 of [19]. The exact definition of protection schemes is out of the scope of the file format. 

The algorithms for encryption and integrity protection are defined by SRTP. Table 7.5 summarizes the format 
identifiers defined here. An entry of four spaces ($20$20$20$20) may be used to indicate that a process outside the file 
format decides the choice of algorithm for either encryption or integrity protection. 

Table 7.5: Algorithms for encryption and integrity protection 



Format 


Algorithm 


$20$20$20$20 


The choice of algorithm for either encryption or integrity protection is decided 
by a process outside the file format 


ACM1 


Encryption using AES in Counter Mode with 128-bit key, as defined in 
Section 4.1.1 of [19] 


AF81 


Encryption using AES in F8-mode with 128-bit key, as defined in Section 
4.1.2 of [19] 


ENUL 


Encryption using the NULL-algorithm as defined in Section 4.1.3 of [19] 


SHM2 


Integrity protection using HMAC-SHA-1 with 160-bit key, as defined in 
Section 4.2.1 of [19] 


ANUL 


Integrity protection not applied to RTP (but still applied to RTCP). Note: this 
is valid only for IntegrityAlgorithmRTP. 



7.7 Aggregated RTP payloads 



An appHcation data unit (ADU), normally being the smallest independently usable data unit, is specified as follows for 
coding formats and RTP payload formats allowed in 3GP files: 

- For audio and speech, an ADU is specified as a coded frame intended for transport. 

- For H.263 an ADU consists of an entire RTP payload. 

- For MPEG-4 Visual an ADU consists of a complete or partial VOP in the RTP payload. 

- For H.264 (AVC), an ADU is a Network Adaptation Layer Unit (NALU). 

- For timed text, an ADU consists of any of the type 1-5 RTP payload units [28]. 

For encrypted RTP payloads, the actual ADUs are hidden within the encrypted payload. Some RTP payload formats 
allow aggregation of multiple ADUs into a single RTP payload. When any hint sample in an RTP hint traclc defines a 
payload including multiple ADUs, each hint sample in the hint track shall comply with the following requirements: 

- The extra-flag in the RTPPacket class of the hint sample shall be set to 1 . This indicates that there is extra 
information before the RTP constructors in the form of type-length- value sets. 
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- The extra information in the hint sample shall include a "3gau" structure as specified below. 

class SgppApplicationDataUnitlnfoTLV extends Box("3gau") { 
unsigned int( 16) entrycount; 
for(i=l; i<=entry count; i++){ 

unsigned int(32) numbytes; 
unsigned int(64) decorder; 
unsigned int(32) timestampoffset 
} 
} 

entrycount indicates the number of ADUs in the RTP payload. 

numbytes indicates the number of bytes of the i"th ADU in the RTP payload. 

decorder indicates the decoding order of ADUs within the RTP hint track. The smaller value of decorder, the earlier the 
ADU is in decoding order. All ADUs shall have a unique value of decorder, and the assignment shall be done using 
consecutive numbers. If two or more ADUs can be decoded virtually simultaneously, i.e. their relative decoding order is 
undefined, they shall still be assigned consecutive numbers. 

timestampoffset indicates the RTP timestamp offset of the i"th ADU relative to the timestamp of RTP header of the 
packet it will be transmitted in. Where the ADU's timestamp value is equal to what it would have had if it were 
transmitted in an RTP packet containing only the ADU. 



8 Asset information 

8.1 General 

Asset information in a 3GP file describes the contained media. Clause 8.2 defines 3GPP asset meta data that is 
backward compatible with Release 6. However, in order to provide more enriched information for audio, it is also 
possible to include ID3 version 2 (ID3v2) tags as described in clause 8.3. 

8.2 3GPP asset meta data 

A user-data box ('udta'), as defined in [7] may be present in conforming files. It should reside within the Movie box, but 
may reside within the Track box, following the hierarchy of boxes described in Clause 6.2. 

Within the user-data box, there may reside sub-boxes that contain asset meta-data, taken from the list of boxes in tables 
8.1 through 8.10 below (zero or more sub-boxes of each kind, zero or one for each language or role of location 
information). Each of the sub-boxes conforms to the definition of a "full box" as specified in [7] (hence the 'Version' 
and 'Flags' fields). 

The following sub-boxes are in use for the following purposes: 

- titl - title for the media (see table 8.1) 

- dscp - caption or description for the media (see table 8.2) 

- cprt - notice about organisation holding copyright for the media file (see table 8.3) 

- perf - performer or artist (see table 8.4) 

- auth - author of the media (see table 8.5) 

- gnre - genre (category and style) of the media (see table 8.6) 

- rtng - media rating (see table 8.7) 

- clsf - classification of the media (see table 8.8) 

- kywd - media keywords (see table 8.9) 
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loci - location information (see table 8.10) 

albm - album title and track number for the media (see table 8.11) 

yrrc - recording year for the media (see table 8.12) 

coll - name of the collection from which the media comes (see table 8.12a) 

urat - user "star" rating of the media (see table 8.12b) 

thmb — thumbnail image of the media (see table 8.12c) 

Table 8.1 : The Title box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'titr 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Pad 


Bitd) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




Title 


String 


Text of title 





Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. 

Title: null-terminated string in either UTF-8 or UTF-16 characters, giving a title information. If UTF-16 is used, the 
string shall start with the BYTE ORDER MARK (OxFEFF). 

Table 8.2: The Description box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'dscp' 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Pad 


Bit(1) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




Description 


String 


Text of description 





Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. 

Description: null-terminated string in either UTF-8 or UTF-16 characters, giving a description information. If UTF-16 
is used, the string shall start with the BYTE ORDER MARK (OxFEFF). 

Table 8.3: The Copyright box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'cprt' 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Pad 


Bitd) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




Copyright 


String 


Text of copyright notice 





Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. 
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Copyright: null-terminated string in either UTF-8 or UTF-16 characters, giving a copyright information. If UTF-16 is 
used, the string shall start with the BYTE ORDER MARK (OxFEFF). 

Table 8.4: The Performer box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'perf 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Pad 


Bitd) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




Performer 


String 


Text of performer 





Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. 

Performer: null-terminated string in either UTF-8 or UTF-16 characters, giving a performer information. If UTF-16 is 
used, the string shall start with the BYTE ORDER MARK (OxFEFF). 

Table 8.5: The Author box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'auth' 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Pad 


Bitd) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




Author 


String 


Text of author 





Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. 

Author: null-terminated string in either UTF-8 or UTF-16 characters, giving an author information. If UTF-16 is used, 
the string shall start with the BYTE ORDER MARK (OxFEFF). 

Table 8.6: The Genre box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'gnre' 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Pad 


Bitd) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




Genre 


String 


Text of genre 





Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. 

Genre: null-terminated string in either UTF-8 or UTF-16 characters, giving a genre information. If UTF-16 is used, 
the string shall start with the BYTE ORDER MARK (OxFEFF). 
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Table 8.7: The Rating box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




Ytng' 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







RatingEntity 


Unsigned int(32) 


Four-character code rating entity 




RatingCriteria 


Unsigned int(32) 


Four-character code rating criteria 




Pad 


Bit(1) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




Ratinglnfo 


String 


Text of media-rating information 





RatingEntity: four-character code that indicates the rating entity grading the asset, e.g., 'BBFC. The values of this 
field should follow common names of worldwide movie rating systems, such as those mentioned in 
[http://www.movie-ratings.net/, October 2002]. 

RatingCriteria: four-character code that indicates which rating criteria are being used for the corresponding rating 
entity, e.g., "PG13". 

Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. 

Ratinglnfo: null-terminated string in either UTF-8 or UTF-16 characters, giving a rating information. If UTF-16 is 
used, the string shall start with the BYTE ORDER MARK (OxFEFF). 

Table 8.8: The Classification box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'clsf 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







ClassificationEntity 


Unsigned int(32) 


Four-character code classification 
entity 




ClassificationTable 


Unsigned int(16) 


Index to classification table 




Pad 


Bitd) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




Classificationlnfo 


String 


Text of media-classification 
information 





ClassificationEntity: four-character code that indicates the classification entity classifying the asset. The values of this 
field should follow names of worldwide classification systems to be identified, but may be assigned blanks to 
indicate no specific classification entity. 

ClassificationTable: binary code that indicates which classification table is being used for the corresponding 
classification entity. 0x00 is reserved to indicate no specific classification table. 

Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. 

Classificationlnfo: null-terminated string in either UTF-8 or UTF-16 characters, giving a classification information, 
taken from the corresponding classification table, if specified. If UTF-16 is used, the string shall start with the 
BYTE ORDER MARK (OxFEFF). 
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Table 8.9: The Keywords box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'kywd' 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Pad 


Bitd) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




KeywordCnt 


Unsigned int(8) 


Binary number of keywords 




Keywords 


KeywordStruct[Key 
wordCnt] 


Array of structures that hold the 
actual keywords (see Table 8.9.1) 





Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. 

KeywordCnt: binary code that indicates the number of keywords provided. This number shall be greater than 0. 

Keywords: Array of structures that hold the actual keywords, according to table 8.9.1. 

Table 8.9.1 : The Keyword Struct 



Field 


Type 


Details 


Value 


KeywordSize 


Unsigned int(8) 


Binary size of keyword 




Keyword Info 


String 


Text of keyword 





KeywordSize: binary code that indicates the total size (in bytes) of the keyword information field. 

Keywordlnfo: null-terminated string in either UTF-8 or UTF-16 characters, giving a keyword information. If UTF-16 
is used, the string shall start with the BYTE ORDER MARK (OxFEFF). 

Table 8.10: The Location Information box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'loci' 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Pad 


Bitd) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




Name 


String 


Text of place name 




Role 


Unsigned int(8) 


Non-negative value indicating role 
of location 




Longitude 


Unsigned int(32) 


Fixed-point value of the longitude 




Latitude 


Unsigned int(32) 


Fixed-point value of the latitude 




Altitude 


Unsigned int(32) 


Fixed-point value of the Altitude 




Astronomical body 


String 


Text of astronomical body 




Additional_notes 


String 


Text of additional location-related 
information 





Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. 

Name: null-terminated string in either UTF-8 or UTF-16 characters, indicating the name of the place. If UTF-16 is 
used, the string shall start with the BYTE ORDER MARK (OxFEFF). 

Role: indicates the role of the place. Value indicates 'shooting location', 1 indicates 'real location', and 2 indicates 
'fictional location'. Other values are reserved. 

Longitude: fixed-point 16.16 number indicating the longitude in degrees. Negative values represent western longitude. 
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Latitude: fixed-point 16.16 number indicating the latitude in degrees. Negative values represent southern latitude. 

Altitude: fixed-point 16.16 number indicating the altitude in meters. The reference altitude, indicated by zero, is set to 
the sea level. 

Astronomical_body: null-terminated string in either UTF-8 or UTF-16 characters, indicating the astronomical body on 
which the location exists, e.g. 'earth'. If UTF-16 is used, the string shall start with the BYTE ORDER MARK 
(OxFEFF). 

Additional_notes: null-terminated string in either UTF-8 or UTF-16 characters, containing any additional location- 
related information. If UTF-16 is used, the string shall start with the BYTE ORDER MARK (OxFEFF). 

NOTE 1 : If the location information refers to a time- variant location, 'Name' should express a high-level location, 
such as 'Finland' for several places in Finland or 'Finland-Sweden' for several places in Finland and 
Sweden. Further details on time-variant locations can be provided as 'Additional notes'. 

NOTE 2: The values of longitude, latitude and altitude provide cursory Global Positioning System (GPS) 
information of the media content. 

NOTE 3: A value of longitude (latitude) that is less than -180 (-90) or greater than 180 (90) indicates that the GPS 
coordinates (longitude, latitude, altitude) are unspecified, i.e. none of the given values for longitude, 
latitude or altitude are valid. 

Table 8.1 1 : The Album box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'albm' 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Pad 


Bitd) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




AlbumTitle 


String 


Text of album title 




TrackNumber 


Unsigned int(8) 


Optional integer with track number 





Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. 

AlbumTitle: null-terminated string in either UTF-8 or UTF-16 characters, giving an album information. If UTF-16 is 
used, the string shall start with the BYTE ORDER MARK (OxFEFF). 

TrackNumber: the track number (order number) of the media on this album. This is an optional field. 

Table 8.12: The Recording Year box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'yrrc' 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







RecordingYear 


Unsigned int(16) 


Integer value of recording year 





RecordingYear: the year when the media was recorded. 
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Table 8.12a: The Collection name box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'coir 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Pad 


Bitd) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




Name 


String 


Text of collection name 





Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. 

Name: null-terminated string in either UTF-8 or UTF-16 characters, giving collection name information. A collection 
contains works that may be conceptually independent, usually with some aspect in common, and may be user- 
defined. If UTF-16 is used, the string shall start with the BYTE ORDER MARK (OxFEFF). 

Table 8.12b: The User-rating box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'urat' 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Pad 


Unsigned int(24) 







StarRating 


Unsigned int(8) 


User"s "star" rating 





StarRating: either the value (indicating no rating assigned) or a value in the range 10 through 50 inclusive, 
indicating a rating between 1 star (1.0, lowest rated by the user) and 5 stars (5.0, highest rated by the user) 
inclusive. 

Table 8.12c: The Thumbnail box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'thmb' 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Format 


Unsigned int(32) 


Four-character code of the coding 
format 




Data 


bytes to end of box 


Image data 





Format: four-character code that indicates the encoding system for the thumbnail or thumbnail reference. That shall be 
"jpeg". 

Data: the image data, as indicated in the Format field. The Data is the image or reference in the indicated format. The 
Format "jpeg" indicates an image in the JPEG format, that shall conform to the requirements of section 7.5 
respectively of [3] (i.e. 3GPP TS 26.234). 



8.3 



IDS version 2 meta data 



ID3 version 2 meta-data can be stored in 3GP files by using the Meta box defined by the ISO base media file format [7]. 
The procedure is specified by MP4REG, the MP4 Registration Authority [32] and is provided here for information. 

The ID3v2 meta data is stored in the Meta box ("meta"), which shall contain a Handler box with handler "ID32". The 
actual meta data is either stored in one or more ID3v2 box(es) inside the meta-data box, or this entire set of box(es) is 
referenced as the primary item, and stored elsewhere. The ID3v2 box is defined in Table 8.13. 
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Table 8.13: ID3v2 box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'ID32' 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 







Pad 


Bitd) 







Language 


Unsigned int(5)[3] 


Packed ISO-639-2/T language code 




ID3v2data 


Unsigned int(8)[] 


Complete ID3 version 2.x.x data 





Language: declares the language code for the following text. See ISO 639-2/T for the set of three character codes. 
Each character is packed as the difference between its ASCII value and 0x60. The code is confined to being three 
lower-case letters, so these values are strictly positive. If there are some language fields inside IDS tag, language 
must not conflict with them. Instead codes 'mul' (multiple languages) and 'und' (undetermined language) should be 
used in such cases. 

ID3v2data: binary data that corresponds to ID3v2 tag format (e.g. for v.2.4.0: http://www.id3. org/id3v2.4.0- 

structure.txt) and its native frames (e.g. for v.2.4.0: http://www.id3.Org/id3v2.4.0-frames.txt) . ID3 tag must not 
contain any footer information, because it is never needed. Both ID3v2 tag format and its native frames must use 
the same version of the specification. Size of this field can be derived from the box size. The version of the ID3 
data may be found by inspecting it 

The ID3v2 box contains a complete ID3 version 2.x.x data. It should be parsed according to ID3v2 [33] specifications 
for V.2.X.X tags. There may be multiple ID3v2 boxes using different language codes. 



Video buffer information 



9.1 



General 



A 3GP file can include video-buffer parameters associated with video streams. For the case when only one set of 
parameters is associated to an entire video stream, these can be included in the corresponding media-level SDP 
fragment. However, in order to provide buffer parameters for different operation points, as defined below, and for 
different synchronization points, a track can contain a video buffer sample grouping. The type of sample grouping 
depends on which video-buffer model that is used for a particular video codec. 

For H.263 and MPEG-4 visual, the PSS buffering model, defined in Annex G of TS 26.234 [3] (PSS Annex G), is used. 
Buffer parameters for several operation points and synchronization points may be specified by a 3 GPP PSS Annex G 
sample grouping as defined in clause 9.2.1. 

For H.264 (AVC), there are two types of buffers: 

- H.264 (AVC) Hypothetical Reference Decoder (HRD) model; 

- de-interleaving buffer of the interleaved RTP packetization mode of H.264 (AVC). 

Buffer parameters for several operation points and synchronization points of the HRD model may be specified by an 
AVC HRD sample grouping as defined in clause 9.2.2. Only one set of de-interleaving parameters can be associated to 
a stream and therefore the de-interleaving parameters are included in the corresponding media-level SDP fragment 
according to the H.264 (AVC) MIME/SDP specification in [30]. 

NOTE: Any VUI HRD parameters, buffering period SEI message, and picture timing SEI message in H.264 
(AVC) streams or included in the sprop-parameter-sets MIME/SDP parameter of a media-level SDP 
fragment must not contradict each other or the information in the AVC HRD sample grouping, if any. 

9.2 Sample groupings for video-buffer parameters 

A sample grouping is an assignment of each sample in a track to be a member of one (or none) of several sample 
groups, based on a grouping criterion. The assignment of buffer parameters to synchronization points (sync samples) 
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provides one sample grouping of the samples in a track. The usage of sample groups in 3GP files shall follow the syntax 
defined in [20]. 

Each sample is associated to zero or one sample group entries of any given grouping type in the sample group 
description box ('sgpd'). Sample group entries for sample groups defined by the grouping type '3gag' are given by the 
3GPP PSS Annex G Sample group entry, defined in Table 9.1, and sample group entries for sample groups defined by 
the grouping type 'avcb' are given by the AVC HRD Sample group entry, defined in Table 9.2. 

Sample group entries provide buffer parameters relevant to all samples in the corresponding sample group(s). A sync 
sample and all following non-sync samples before the next sync sample shall be members of the same sample group 
with respect to the video-buffer grouping type. The indicated buffer parameters for a sync sample are applicable for the 
stream from that sync sample onwards. 

NOTE: A file, in which some but not all samples are associated with sample groups with respect to the grouping 
type '3gag' or 'avcb', may have been edited and may therefore no longer conform to corresponding buffer 
model. 

9.2.1 3GPP PSS Annex G sample grouping 

The grouping type '3gag' defines the grouping criterion for 3GPP PSS Annex G buffer parameters. Zero or one sample- 
to-group box ('sbgp') for the grouping type '3gag' can be contained in the sample table box ('stbl') of a track. It shall 
reside in a hint track, if a hint track is used, otherwise in the video track. The presence of this box and grouping type 
indicates that the associated video stream complies with PSS Annex G. Note that the nature of the track defines the 
media transport for which the buffer parameters are calculated, e.g. for an RTP hint track, the media transport is RTP. 

Table 9.1 : 3GPP PSS Annex G sample group entry 



Field 


Type 


Details 


Value 


BufferParameters 


AnnexGstruc 


Structure which holds the buffer 
parameters of PSS Annex G 





BufferParameters: the structure where the PSS Annex G buffer parameters reside. 

AnnexGstruc is defined as follows: 

struct AnnexGstruc { 

Unsigned int( 1 6) operation_point_count 
for (i = 0; i < operation_point_count; i++){ 
Unsigned int (32) tx_byte_rate 
Unsigned int (32) dec_byte_rate 
Unsigned int (32) pre_dec_buf_size 
Unsigned int (32) init_pre_dec_buf_period 
Unsigned int (32) init_post_dec_buf_period 
} 
} 

The definitions of the AnnexGstruc members are as follows: 

operation_point_count: specifies the number of operation points, each characterized by a pair of transmission byte rate 
and decoding byte rate. Values of buffering parameters are specified separately for each operation point. The value of 
operation_point_count shall be greater than 0. 

tx_byte_rate: indicates the transmission byte rate (in bytes per second) that is used to calculate the transmission 
timestamps of media-transport packets for the PSS Annex G buffering verifier as follows. Let tl be the transmission 
time of the previous media-transport packet and sizel be the number of bytes in the payload of the previous media- 
transport packet in transmission order, excluding the media-transport payload header and any lower-layer headers. For 
the first media-transport packet of the stream, tl and sizel are equal to 0. The media track shall comply with PSS Annex 
G when each sample is packetized in one media-transport packet, the transmission order of media-transport packets is 
the same as their decoding order, and the transmission time of an media-transport packet is equal to tl + sizel / 
tx_byte_rate. The value of tx_byte_rate shall be greater than 0. 

dec_byte_rate: indicates the peak decoding byte rate that was used in this operation point to verify the compatibility of 
the stream with PSS Annex G. Values are given in bytes per second. The value of dec_byte_rate shall be greater than 0. 
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pre_dec_buf_size: indicates the size of the PSS Annex G hypothetical pre-decoder buffer in bytes that guarantees 
pauseless playback of the entire stream under the assumptions of PSS Annex G. 

init_pre_dec_buf_period: indicates the required initial pre-decoder buffering period that guarantees pauseless 
playback of the entire stream under the assumptions of PSS Annex G. Values are interpreted as clock ticks of a 90-kHz 
block. That is, the value is incremented by one for each 1/90 000 seconds. For example, value 180 000 corresponds to a 
two second initial pre-decoder buffering. 

init_post_dec_buf_period: indicates the required initial post-decoder buffering period that guarantees pauseless 
playback of the entire stream under the assumptions of PSS Annex G. Values are interpreted as clock ticks of a 90-kHz 
clock. 



9.2.2 AVC HRD sample grouping 



The grouping type 'avcb' defines the grouping criterion for AVC HRD parameters. Zero or one sample-to-group box 
('sbgp') for the grouping type 'avcb' can be contained in the sample table box ('stbl') of a track. It shall reside either in a 
hint track or a video track. The presence of this box and grouping type indicates that the associated video stream 
complies with AVC HRD with the indicated parameters. 



Table 9.2: AVC HRD sample group entry 



Field 


Type 


Details 


Value 


AVCHRDParameters 


AVCHRDstruc 


Structure which holds the AVC HRD 
parameters 





AVCHRDParameters: the structure where the AVC HRD parameters reside. 
AVCHRDstruc is defined as follows: 



struct AVCHRDstruc { 



Unsigned int( 1 6) operation_point_count 
for (i = 0; i < operation_point_count; i++){ 

Unsigned int (32) tx_byte_rate 

Unsigned int (32) 

Unsigned int (32) 

Unsigned int (32) 

Unsigned int (32) 
} 



pre_dec_buf_size 
post_dec_buf_size 
init_pre_dec_buf_period 
init_post_dec_buf_period 



The definitions of the AVCHRDstruc members are as follows: 

operation_point_count: specifies the number of operation points. Values of AVC HRD parameters are specified 
separately for each operation point. The value of operation_point_count shall be greater than 0. 

tx_byte_rate: indicates the input byte rate (in bytes per second) to the coded picture buffer (CPB) of AVC HRD. The 
bitstream is constrained by the value of BitRate equal to 8 * the value of tx_byte_rate for NAL HRD parameters as 
specified in [29]. For VCL HRD parameters, the value of BitRate is equal to tx_byte_rate * 40 / 6. The value of 
tx_byte_rate shall be greater than 0. 

pre_dec_buf_size: gives the required size of the pre-decoder buffer or coded picture buffer in bytes. The bitstream is 
constrained by the value of CpbSize equal to pre_dec_buf_size * 8 for NAL HRD parameters as specified in [29] . For 
VCL HRD parameters, the value of CpbSize is equal to pre_dec_buf_size * 40 / 6. 

At least one pair of values of tx_byte_rate and pre_dec_buf_size of the same operation point shall conform to the 
maximum bitrate and CPB size allowed by profile and level of the stream. 

post_dec_buf_size: gives the required size of the post-decoder buffer, or the decoded picture buffer, in unit of bytes. 
The bitstream is constrained by the value of max_dec_frame_buffering equal to Min( 16, Floor( post_dec_buf_size ) / 
( PicWidthMbs * FrameHeightlnMbs * 256 * ChromaFormatFactor ) ) ) as specified in [29]. If the SDP attribute 3gpp- 
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videopostdecbufsize is not present for an H.264 (AVC) stream, the value of max_dec_frame_buffering is inferred as 
specified in [29] . 

init_pre_dec_buf_period: gives the required delay between the time of arrival in the pre-decoder buffer of the first bit 
of the first access unit and the time of removal from the pre-decoder buffer of the first access unit. It is in units of a 90 
kHz clock. The bitstream is constrained by the value of the nominal removal time of the first access unit from the coded 
picture buffer (CPB), ti-n( ), equal to init_pre_dec_buf_period as specified in [29]. 

init_post_dec_buf_period: gives the required delay between the time of arrival in the post-decoder buffer of the first 
decoded picture and the time of output from the post-decoder buffer of the first decoded picture. It is in units of a 90 
kHz clock. The bitstream is constrained by the value of dpb_output_delay for the first decoded picture in output order 
equal to init_post_dec_buf_period as specified in [29] assuming that the clock tick variable, tc, is equal to 1 / 90 000. 



9a Stereoscopic 3D video 
9a. 1 General 

Stereoscopic 3D video can be encapsulated and delivered in 3GP files (or 3GP segments in the case of DASH). Frame 
compatible H.264/AVC and temporally interleaved H.264/AVC use the traditional AVC file format [20] where 
information about the stereo arrangement is carried in an SEI message "frame packing arrangement SEI". Multiview 
Video Coding MVC on the other hand uses extensions of the AVC file format [20] which specify separate signalling for 
MVC streams. 

Storing frame compatible or temporally interleaved stereoscopic 3D video in a 3GP file as described above ensures that 
a UE can decode the bitstreams correctly (if it has the corresponding decoding capability), but it does not ensure that a 
UE renders the 3D video correctly. For instance, a UE that is not aware of the SEI message indicating that a bitstream 
represents frame compatible 3D or temporally interleaved 3D will simply render the video frames as consecutive 2D 
frames. The output will most likely look like garbage or with disturbing artefacts to the viewer. 

The above problem is avoided by enforcing post-decoder requirements with the restricted video mechanism specified in 
the ISO base media file format [7]. The mechanism is similar to the content protection transformation where sample 
entries are hidden behind generic sample entries, " encv" , " enca " , etc., indicating encrypted or encapsulated media. 
The analogous mechanism for restricted video uses a transformation with the generic sample entry " resv" . The 
method should be applied when the content should only be decoded by clients that present it correctly. For the above 
cases with frame compatible and temporally interleaved 3D video, the scheme type for stereoscopic video " stvi " [7] 
should be used. 

In addition, UEs consuming content provided in the 3GP file format expect to identify the content based on the MIME 
Type of the 3GP file in order to accept or reject content. RFC6381 [34] provides an ability to signal profile and codec 
parameters and may be considered to be used in this context as well. For more details refer to clause 10.5. 

The following sub-clauses describe stereoscopic 3D file format signalling in more detail including the case of mixed 
services. 



9a.2 Frame compatible H.264/AVC 



Frame compatible H.264/AVC is stored in a 3GP file as defined for H.264/AVC in the AVC file format [20] where the 
AVC sample entry has been transformed according to the restricted video mechanism using the sample entry " resv" 
and the stereo video scheme type " stvi " [7]. The stereo scheme of the stereo video box is 1, i.e. the stereo indication 
type identifies the frame packing arrangement type by using the values defined by the frame packing arrangement SEI 
(Table D-8 of ISO/IEC 14496-10), for example 3 for Side by Side or 5 for temporal interleaved. 

NOTE: It is recommended to use the "frame packing arrangement SEI" rather than the "stereo video SEI". 
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9a.3 Multiview Video Coding MVC 

Multiview Video Coding MVC is stored and signalled in a 3GP file as defined for MVC in the AVC file format [20]. In 
order to ensure compatibility with H.264 (AVC) file readers, one track shall use sample entry type "avcl". 

9a.4 Mixed 2D/3D video 

Decoding and rendering requirements are signalled in the sample entry descriptions as detailed above. In fact, each 
video sample in a 3GP file is associated with a sample entry description, which in turn specifies if the video data is 2D 
H,264/AVC or any of the above types of 3D H.264/AVC. If a file contains both 2D and 3D video, separate sample entry 
descriptions are used where 2D parts of the file are associated with the 2D sample entry description and 3D parts of the 
file with the appropriate 3D sample entry description. 

9a.5 MIME type signaling for 3D stereoscopic video files 

UEs consuming content provided in the 3GP file format expect to identify the content based on the MIME Type of the 
3GP file in order to accept or reject content. RFC6381 [34] provides an ability to signal profile and codec parameters 
and may be considered to may be used in this context as well. 

To signal content provided in MVC, the codecs parameter as defined in RFC6381 [34] may be used. The details on how 
to signal MVC content are provided in RFC6381 [34], clause 3.3. The clause addresses also the use case when MVC 
content is coded in an H.264/AVC-compatible fashion. 

In case of mixed content, all required capabilities may be signalled in the MIME type parameters. 



10 Encryption 

10.1 General 

A 3GP file may include encrypted media together with information on key management and requirements for 
decrypting and/or serving encrypted media. Tracks containing encrypted media use dedicated sample entries for 
encrypted media, which will be ignored by 3GP readers not capable of handling encrypted media. 3GP readers capable 
of detecting encrypted media are able to obtain 'in the clear' the sample entries that apply to the decrypted media as well 
as all requirements for decrypting the media. Moreover, 3GP readers supporting extended presentations (see clause 11) 
referring to media files rather than media tracks are provided with all requirements for decrypting media files. 

Clause 10.2 and 10.3 are provided here for information in the context of 3GP files. The definitions follow from [7]. 

1 0.2 Sample entries for encrypted media tracks 

The sample entries stored in the sample description box of a media track in a 3GP file identify the format of the 
encoded media, i.e. codec and other coding parameters. All valid sample entries for unencrypted media in a 3GP file are 
described in Clause 6. The principle behind storing encrypted media in a track is to 'disguise' the original sample entry 
with a generic sample entry for encrypted media. Table 10.1 gives an overview of the formats (identifying sample 
entries) that can be used in 3GP files for signalling encrypted video, audio and text. 

Table 10.1 : Formats for encrypted media tracks 



Format 


Original format 


Media content 


'encv' 


's263', 'mp4v', 'aval', ... 


encrypted video: H.263, MPEG-4 visual, H.264(AVC), ... 


'enca' 


'samr', 'sawb', 'sawp', 
'mp4a', ... 


encrypted audio: AMR, AMR-WB, AMR-WB+, Enhanced 
aacPlus, AAC, ... 


'enct' 


'tx3g', ... 


encrypted text: timed text, ... 



The generic sample entries for encrypted media replicate the original sample entries and include a Protection scheme 
information box with details on the original format, as well as all requirements for decrypting the encoded media. The 
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EncryptedVideoSampleEntry and the EncryptedAudioSampleEntry are defined in Tables 10.2 and 10.3, where the 
ProtectionSchemelnfoBox (defined in clause 10.2) is simply added to the list of boxes contained in a sample entry. 

Table 10.2: EncryptedVideoSampleEntry 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"encv" 



\ All fields and boxes of a visual sample entry, e.g. MP4VisualSampleEntry or H263SampleEntry. < 



ProtectionSchemelnfoBox 




Box with information on the 
original format and encryption 





Table 10.3: EncryptedAudioSampleEntry 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"enca" 



All fields and boxes in an audio sample entry, e.g. MP4AudioSampleEntry or AMRSampleEntry. 



ProtectionSchemelnfoBox 




Box with information on the 
original format and encryption 





The EncryptedVideoSampleEntry and the EncryptedAudioSampleEntry can also be used with any additional codecs 
added to the 3GP file format, as long as their sample entries are based on the SampleEntry of the ISO base media file 
format [7]. 

The EncryptedTextSampleEntry is defined in Table 10.4. Text tracks are specific to 3GP files and defined by the Timed 
text format [4]. In analogy with the cases for audio and video, a ProtectionSchemelnfoBox is added to the list of 
contained boxes. 

Table 10.4: EncryptedTextSampleEntry 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"enct" 



All fields and boxes of TextSampleEntry. 



ProtectionSchemelnfoBox 




Box with information on the 
original format and encryption 





NOTE: The boxes within the sample entries defined in Tables 10.2-10.4 may not precede any of the fields. The 
order of the boxes (including the ProtectionSchemelnfoBox) is not important though. 



10.3 Key management 



The necessary requirements for decrypting media are stored in the Protection scheme information box. For the case of 
media tracks, it contains the Original format box, which identifies the codec of the decrypted media. For both media 
tracks and media files, it contains the Scheme type box, which identifies the protection scheme used to protect the 
media, and the Scheme information box, which contains scheme-specific data (defined for each scheme). It is out of the 
scope of this specification to define a protection scheme. 

The Protection scheme information box and its contained boxes are defined in Tables 10.5-10.8. 
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Table 10.5: ProtectionSchemelnfoBox 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"sinf" 


OriginalFormatBox 




Box containing identifying the 
original format 




SchemeTypeBox 




Optional box containing the 
protection scheme. 




SchemelnformationBox 




Optional box containing the 
scheme information. 




Table 10.6: OriginalFormatBox 


Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"frma" 


DataFormat 


Unsigned int(32) 


original format 





DataFormat identifies the format (sample entry) of the decrypted, encoded data. The currently defined formats in 3GP 
files include 'mp4v', 'h263', 'avcl', 'mp4a', 'samr', 'sawb', 'sawp' and 'tx3g'. 

Table 10.7: SchemeTypeBox 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"schm" 


BoxHeader.Version 


Unsigned int(8) 







BoxHeader. Flags 


Bit(24) 




Oor1 


SchemeType 


Unsigned int(32) 


four-character code identifying 
the scheme 




SchemeVersion 


Unsigned int(32) 


Version number 




SchemeURI 


Unsigned int(8)[ ] 


Browser URI (null-terminated 
UTF-8 string). Present if 
(Flags & 1)true 





SchemeType and SchemeVersion identifiy the encryption scheme and its version. As an option, it is possible to 
include SchemeURI with an URI pointing to a web page for users that don"t have the encryption scheme installed. 

Table 10.8: SchemelnformationBox 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"schi" 






Box(es) specific to scheme 
identified by SchemeType 





The boxes contained in the Scheme information box are defined by the scheme type, which is out of the scope of this 
specification to define. 



1 1 Extended presentation format 
11.1 General 

A 3GP file may include an extended presentation that consists of media files in addition to tracks for audio, video and 
text. Examples of such media files are static images, e.g. JPEG files, which can be stored in a 3GP 'container file'. A 
3GP container file that includes an extended presentation must include a scene description that governs the rendering of 
all parts of the 3GP file. 
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1 1 .2 Storage format 



A 3GP file with an extended presentation shall include a Meta box ("meta") at the top level of the file as defined in [7]. 
The Meta box shall include the following boxes: 

- Handler box with handler type "3gsd" (3GPP scene description); 

- Primary item box or XML box identifying the scene description; 
Item information box; 

- Item location box (see below). 

A scene description (e.g. an SVG scene, in the case of DIMS, or a SMIL file) shall be included either in an XML box or 
as an item located by the Item location box. The scene description may refer to both tracks and media files (items). 

A 3GP file that contains media files and/or a scene description not stored in an XML box shall include an Item location 
box locating all contained files and the scene description. Each item corresponding to a media file of the Item location 
box shall also be included in the Item information box in order to specify its filename (item name) and MIME type. The 
Item information box shall also include an entry for the scene description that specifies its MIME type. By referring to a 
Protection scheme information box in the Item protection box, the Item information box can also indicate whether the 
content of an item is protected (encrypted) as defined in [7] and discussed in clause 10 of the present specification. 

11.3 URL forms for items and tracks 

All media files and the scene description included in a 3GP file are logically located in the same directory as the 3GP 
file itself. In general, the Meta box of a 3GP file serve as a container of files that logically 'shadow' files outside the 
3GP file. See the description of URL forms for Meta boxes in [7] for further details. The Movie box ("moov") of a 3GP 
file contains all media tracks and possible scene description update tracks. 

The scene description (primary item) of a 3GP file addresses other resources by using relative URLs. In particular it 
addresses 

- media files (items) by referring to their filenames; 

- media tracks by referring to the Movie box with the relative URL "#box=moov". 

The default is to address all tracks of the Movie box. However, it is possible to address individual media tracks in the 
Movie box by referring to their track IDs. The relative URL of a track is defined in terms of ABNF [31] as follows: 

relative-track-URL = "#box=moov;track_ID=" track-number* ("," track-number) 

track-number = 1 *digit 

Hence, individual tracks are referenced by listing their numbers, e.g. "#box=moov;track_ID=l,3". 

A DIMS (SVG) scene description (primary item) can also address scene updates in a track using the above URL forms. 
For instance, applying updates to the scene description stored in track 1 after 10 seconds is done as follows: 

<updates xlink:href="#box=moov;track_ID=l" begin="10"/> 

Note: It is possible to include a 3GP file with tracks as a media file (addressed by filename) rather than using a 
top-level Movie box for tracks. However, this way the included 3GP file will be 'hidden' one layer and 
interleaving between individual tracks and items less transparent. 

1 1 .4 Examples 

1 1 .4.1 SMIL presentation 

The following example consists of a slide show in SMIL consisting of three images shown with the duration of 3 
seconds each and an AMR clip that is played in parallel. The presentation is built from a number of separate files: 
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- SMIL file: "scene. smil"; 

- 3GP file with AMR: "audioclip.Sgp" ; 

- Image files: "picl.jpg", "pic2.jpg" and "pic3.jpg". 

These files can be packaged into a single 3GP file "presentation.Sgp" as an extended presentation. The overall 
presentation is governed by the SMIL file located as the primary item of "presentation.Sgp": 

<smil xmlns="http: //www.w3 . org/200l/SMIL20/Language" > 
<head> 

<layout> 

<root-layout width="176" height=" 144 "/> 
<region id="pics" left="0" width="176" height=" 144 "/> 
</ layout > 
</head> 
<body> 
<par> 

<audio src="#box=moov" dur="9s"/> 
<seq> 

<img region="pics" src="picl . jpg" dur="3s"/> 
<img region="pics" src="pic2 . jpg" dur="3s"/> 
<img region="pics" src="pic3 . jpg" dur="3s"/> 
</seq> 
</par> 
</body> 
</smil> 

The audio track resides in the Movie box and is referred to as "#box=moov", whereas the images are included as media 
files in the Meta box. 

1 1 .4.2 DIMS presentation 

The following example consists of a DIMS presentation that refers to images, an AMR clip and scene updates. The 
presentation is contained in a single Extended-presentation profile 3GP file containing: 

- DIMS scene description (SVG scene) stored as item 1 identified by a Primary item box; 

- DIMS updates stored as a DIMS track (track ID 1); 

- AMR clip stored as an AMR track (track ID 2); 

- Image files: "picl.jpg", "pic2.jpg" and "pic3.jpg" stored as items 2, 3 and 4. 

All references to the DIMS and AMR tracks and the images are made by relative URLs from the DIMS Unit in the 
primary item: 

<svg xmlns="http : //www. w3 . org/2000/svg" version=" 1 . 2 " baseProf ile=" tiny" 
xmlns :xlink=http : //www. w3 . org/l999/xlink 
width="320" height="240" viewBox="0 320 240"> 
<desc>DIMS example</desc> 

<updates xlink :href ="#box=moov; track_ID=l" begin="10"/> 
<audio xlink : href ="#box=moov; track_ID=2 " audio-level=" . 7" 

type= " audio/AMR " begin= " 1 " / > 
<image x="0" y="0" width="100" height="100" xlink :href="picl . jpg" > 
<image x="0" y="100" width="100" height="100" xlink :href="pic2 . jpg" > 
<image x="100" y="0" width="100" height="100" xlink :href="pic3 . jpg" > 
</svg> 

An Item information box specifies the MIME type of the scene description (SVG scene) and the filenames and MIME 
types of the image files. An Item location box specifies the locations of all items. 
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12 Media Stream Recording 
12.1 Unprotected Stream Recording 

Received RTP media streams may be stored in 3GP files conforming to the Media Stream Recording profile. RTP 
packets may be stored in RTP reception hint tracks. RTCP packets may be stored in RTCP reception hint tracks. 



12.2 Protected Stream recording 



SRTP protected media may be stored in 3GP files conforming to the 3GP Media Stream Recording Profile. SRTP and 
corresponding SRTCP packets are stored in SRTP reception hint tracks and SRTCP reception hint tracks, respectively, 
as described in [38]. Corresponding MIKEY MBMS Traffic Key messages are stored in OMA BCAST STKM tracks as 
described in clause 12.2.1. Additionally, SDP and Protection Description information is stored as described in clauses 
12.3 and 12.2.2. 

12.2.1 Key message tracks 

MIKEY MBMS Traffic Key messages as defined in [39] shall be stored in OMA BCAST STKM tracks "oksd" as 
defined in [37]. A 3GP file with SRTP recording extensions shall contain at least one STKM track. Furthermore, all key 
messages related a specific SRTP reception hint track shall be recorded in the same STKM track. Track references of 
type "cdsc" shall be used to link STKM tracks to SRTP reception hint tracks as described in [37]. 

In the Sample Description Entry of the STKM track, the filed sample_version shall be set to 0x00 and the field 
sample_type shall be set to Oxf7. The value Oxf7 indicates MIKEY MBMS Traffic Key messages. 

Each Sample Entry of a STKM track shall contain exactly one MIKEY MBMS Traffic Key messages in the STKM 
field. That is, the STKM field shall contain the payload of the received MIKEY package (without IP and UDP headers) 
including all MIKEY headers and all MIKEY payloads and the MIKEY MAC/Signature field. 

12.2.2 Protection Description 

The ServiceProtectionDescription box shall be defined as stated in table 12.1. The ServiceProtectionDescription box 
shall be included for each Sample Description Entry of a SRTP reception hint track as a sub box of the 
SchmelnformationBox "schi" in the SRTPProcessBox box "srpp" as defined in [7]. 

Table 12.1: ServiceProtectionDescription box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




"spdb" 


BoxHeader.Version 


Unsigned int(8) 







SecurityDescription 


Unsigned int(8)[] 


Service Protection Description 
IVIetadata Fragment 





BoxHeader Size, Type, Version: indicate the size, type and version of the ServiceProtectionDescription box. The type 
shall be "spdb" and the version shall be 0. 

SecurityDescription:. This field shall contain the XML encoded Service Protection Description Metadata Fragment as 
specified in [40] with the restriction that only the mediaFlow element referring to the SRTP stream from which this 
SRTP reception hint track was recorded is contained. That is, the SecurityDescription shall contain exactly one 
mediaFlow element and this element shall correspond to the stored SRTP packets described by the Sample Description 
which also contains this SecurityDescription. 

12.3 SDP 

Fragments that together constitute an SDP description shall be contained in a 3GP file with Media Stream recording 
extensions. Session-level SDP, i.e. all lines before the first media-specific line ('m=' line), shall be stored as Movie SDP 
information within the User Data box, as specified in [7]. Media-level SDP, i.e. an 'm=' line and the lines before the 
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next 'm=' line (or end of SDP) shall be stored as Track SDP information within the User data box of the corresponding 
track. Media-level SDP shall be contained in each corresponding reception hint track or media track. 



13 HTTP streaming extensions 
13.1 Introduction 

This clause describes extensions to the 3GP file format related to Dynamic Adaptive Streaming over HTTP as specified 
in 3GPP TS 26.247 [49] using HTTP [48] as delivery protocol for segments. 



13.2 Segment types 



It is possible in HTTP streaming to form files from segments - or concatenated segments - which would not necessarily 
form 3GP compliant files (e.g. they do not contain a movie box). If such segments are stored in separate files (e.g. on a 
standard HTTP server) it is recommended that these "segment files" start with a segment-type box, to enable 
identification of those files, and declaration of the specifications with which they are compliant. 

A segment type has the same format as an Ttyp' box [7], except that it takes the box type 'styp'. The brands within it 
should include the same brands that were included in the Ttyp' box that preceded the "moov" box, and may also include 
additional brands to indicate the compatibility of this segment with various specification(s) such as the 3GP Media 
Segment Profile defined in clause 5.4.10 of this specification. 

Valid segment type boxes shall be the first box in a segment. Segment type boxes may be removed if segments are 
concatenated (e.g. to form a full 3GP file), but this is not required. Segment type boxes that are not first in their files 
may be ignored. 



1 3.3 Track Fragment Adjustment Box 



Track Fragment Adjustment Boxes describe the relative time difference of the first samples of tracks within a movie 
fragment. When randomly accessing a 3GP file or a Media Segment at a movie fragment that contains a Track 
Fragment Adjustment Box, the Track Fragment Adjustment Box provides instructions on how the timeline of one or 
more of the tracks may be modified to generate synchronization between the tracks. For example, if, in the previous 
fragment, one track ended later than another, the first sample of that track in this fragment will need to be presented 
later also; an edit-list in the track fragment adjustment box containing an empty edit, and then a media edit, achieves 
that effect. 

The syntax of a Track Fragment Adjustment Box as described below is identical to that of edit-lists. However, unlike 
edit-lists, which must always be applied, when present, to adjust the timelines of the containing tracks, a Track 
Fragment Adjustment Box may only be applied when randomly accessing a 3GP file or a Media Segment at a movie 
fragment containing the Track Fragment Adjustment Box. In continuous playback, wherein the track alignment is 
known (e.g. from decoding the previous segment) and sync between tracks has been achieved. Track Fragment 
Adjustment Box shall not be applied. 

The container of the Track Fragment Adjustment Box is the Track Fragment Box. If present, the Track Fragment 
Adjustment Box should be positioned after the Track Fragment Header Box and before the first Track Fragment Run 
box. The Track Fragment Adjustment Box is a container for the Track Fragment Media Adjustment Boxes. 

aligned (8) class TrackFragmentAdjustmentBox extends Box("tfad") { 
} 

The Track Fragment Media Adjustment Box provides explicit time line offsets. By indicating "empty" time, or by 
defining a "dwell", the offset can advantageously delay the playback time of the media in the track so that media in 
different tracks can be synchronized. Alternatively, the media_time value may be used to discard part of the 'earlier' 
tracks. 

aligned (8) class TrackFragmentMediaAdjustmentBox extends FullBox ( "tfma" , version, 0) { 
unsigned int(32) entry_count; 
for (i=l; i <= entry_count; i++) { 
if (version==l) { 

unsigned int(64) segment_duration; 



ETSI 



3GPP TS 26.244 version 11.1.0 Release 11 



51 



ETSI TS 1 26 244 V1 1 .1 .0 (201 2-1 0) 



int(64) media_time; 
} else { // version==0 

unsigned int(32) segment_duration; 
int(32) media_time; 

} 

int(16) media_rate_integer; 
int(16) media_rate_f raction = 0; 



version is an integer that specifies the version of this box (0 or 1). 

entry_count is an integer that gives the number of entries in the following table. 

segment_duration is an integer that specifies the duration of this adjustment segment in units of the timescale in the 
Movie Header Box. 'Adjustment segment' in this context does not refer to the 'Media Segment' that contains the "tfma" 
but refers to the operation that is performed to place the track at appropriate composition time. 

media_time is an integer containing the starting time within the media of this adjustment segment (in media time scale 
units, in composition time). If this field is set to -1, it is an empty edit. The last adjustment in a track shall never be an 
empty edit. 

media_rate_integer specifies the relative rate at which to play the media corresponding to this adjustment segment. If 
this value is 0, then the adjustment is specifying a "dwell": the media at media-time is presented for the segment- 
duration. Otherwise this field shall contain the value 1 . 



13.4 Segment Index Box 



The Segment Index box ('sidx') provides a compact index of one track within the media segment to which it applies. 
The index is referring to movie fragments and other Segment Index Boxes in a segment. 

Each Segment Index Box documents how a (sub)segment is divided into one or more subsegments (which may 
themselves be subdivided using Segment Index boxes). A subsegment is defined as a time interval of the containing 
(sub)segment, and corresponds to a single range of bytes of the containing (sub)segment. The durations of all the 
subsegments sum to the duration of the containing (sub)segment. 

Specifically for this file format a subsegment is a self-contained set of one or more consecutive movie fragment boxes 
with corresponding Media Data box(es) and a Media Data Box containing data referenced by a Movie Fragment Box 
must follow that Movie Fragment box and precede the next Movie Fragment box containing information about the same 
track. The presentation times documented in the Segment Index are in the movie timeline that is they are composition 
times after the application of any edit list for the track. 

Each entry in the Segment Index box contains a reference type that indicates whether the reference points directly to the 
media bytes of a referenced leaf subsegment, or to a Segment Index box that describes how the referenced subsegment 
is further subdivided; as a result, the segment may be indexed in a "hierarchical" or "daisy-chain" or other form by 
documenting time and byte offset information for other Segment Index boxes applying to portions of the same 
(sub)segment. 

A Segment Index box provides information about a single track of the Segment, referred to as the reference stream. If 
provided, the first Segment Index box in a segment, for a given track, shall document the entirety of that track in the 
segment, and shall precede any other Segment Index box in the segment for the same track. 

If a Segment Index is present for at least one track but not all tracks in the segment, then normally a track in which not 
every sample is independently coded, such as video, is selected to be indexed. For any track for which no segment index 
is present, referred to as non-indexed stream, the track associated with the first Segment Index box in the segment 
serves as a reference stream in a sense that it also describes the subsegments for any non-indexed track. 

A Segment Index box contains a sequence of references to subsegments of the (sub)segment documented by the box. 
The referenced subsegments are contiguous in presentation time. Similarly, the bytes referred to by a Segment Index 
box are always contiguous within the segment. The referenced size gives the count of the number of bytes in the 
material referenced. 

NOTE: A media segment may be indexed by more than one 'top-level' Segment Index box that are independent of 
each other, each of which indexes one track within the media segment. In segments containing multiple tracks the 
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referenced bytes may contain media from multiple tracks, even though the Segment Index box provides timing 
information for only one track. 

The anchor point for a Segment Index box is the first byte after that box. 

Within the two constraints (a) that, in time, the subsegments are contiguous, that is, each entry in the loop is consecutive 
from the immediately preceding one and (b) within a given segment the referenced bytes are contiguous, there are a 
number of possibilities, including: 

1. a reference to a segment index box may include, in its byte count, immediately following Segment Index 
boxes that document subsegments; 

2. using the f irst_of f set field, it is possible to separate Segment Index boxes from the media that they 
refer to; 

3. it is possible to locate Segment Index boxes for subsegments close to the media they index. 

The Segment Index box documents the presence of Stream Access Points (SAPs), as specified in Annex G.6 of 
TS26.247 [49], in the referenced subsegments. The annex specifies characteristics of SAPs, such as Isau. Isap and Tsap, 
as well as SAP types, which are all used in the semantics below. A subsegment starts with a SAP when the subsegment 
contains a SAP, and for the first SAP, Isau is the index of the first sample that follows Isap, and Isap is contained in the 
subsegment. 

The container for 'sidx' box is the file or segment directly. 

aligned (8) class Segment IndexBox extends FullBox ( "sidx" , version, 0) { 
unsigned int(32) ref erence_ID; 
unsigned int(32) timescale; 
if (version==0) 

{ 

unsigned int(32) earliest_presentation_time; 
unsigned int(32) f irst_of f set ; 

} 

else 

{ 

unsigned int(64) earliest_presentation_time; 
unsigned int(64) f irst_of f set ; 

} 

unsigned int(16) reserved = 0; 
unsigned int(16) ref erence_count ; 
for(i=l; i <= reference_count ; i++) 

{ 

bit (1) reference_type; 

unsigned int(31) ref erenced_size; 

unsigned int(32) subsegment_duration; 
bit(l) starts_with_SAP; 

unsigned int(3) SAP_type; 

unsigned int(28) SAP_delta_time; 
} 
} 

reference_track_ID provides the track_ID for the reference track; if this Segment Index box is referenced from a 
'parent' Segment Index box, the value of ref erence_ID shall be the same as the value of ref erence_ID of the 
'parent' Segment Index box 

timescale provides the timescale, in ticks per second, for the time and duration fields within this box; it is recommended 
that this match the timescale of the reference track, i.e. the timescale field of the Media Header Box of the track. 

earliest_presentation_time is the earliest presentation time of any sample in the reference track in the first subsegment, 
expressed in the timescale of the timescale field. 

first_offset is the distance in bytes from the first byte following the containing Segment Indexing Box, to the first byte 
of the first referenced box. 

reference_count: the number of elements indexed by second loop. 

reference_type: when set to indicates that the reference is to a movie fragment ("moof") box; when set to 1 indicates 
that the reference is to a segment index ("sidx") box. 
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referenced_size: the distance in bytes from the first byte of the referenced box to the first byte of the next referenced 
box or in the case of the last entry, the first byte not indexed by this Segment Index Box. 

subsegment_duration: when the reference is to Segment Index Box, this field carries the sum of the 
subsegment_duration fields in that box; when the reference is to a subsegment, this field carries the difference between 
the earliest presentation time of any sample of the reference track in the next subsegment (or the first subsegment of the 
next segment, if this is the last subsegment of the segment or the end composition time of the reference track if this is 
the last subsegment of the representation) and the earliest presentation time of any sample of the reference track in the 
referenced subsegment; the duration is expressed in the timescale value in this box. 

starts_with_SAP: indicates whether the referenced subsegments start with a SAP. For the detailed semantics of this 
field in combination with other fields, see the table below. 

SAP_type: indicates a SAP type as specified in TS26.247 [49], Annex G.6, or the value 0. Other type values are 
reserved. For the detailed semantics of this field in combination with other fields, see the table below. 

SAP_delta_time: indicates Tsap of the first SAP, in decoding order, in the referenced subsegment for the reference 
stream. If the referenced subsegments do not contain a SAP, SAP_delta_time is reserved with the value 0; 
otherwise SAP_delta_time is the difference between the earliest presentation time of the subsegment, and the Tsap 
(note that this difference may be zero, in the case that the subsegment starts with a SAP). 

Table 13.1 : Semantics of SAP and reference type combinations 



starts_with_SAP 


SAP_type 


reference_type 


Meaning 








Oorl 


No information of S APs is provided. 





lto6, 
inclusive 


(media) 


The subsegment contains (but may not start 
with) a SAP of the given SAP_type and the 
first SAP of the given SAP_type 
corresponds to SAP_delta_time. 





lto6, 
inclusive 


1 (index) 


All the referenced subsegments contain a 
SAP of at most the given SAP type and 
none of these SAPs is of an unknown type. 


1 





(media) 


The subsegment starts with a SAP of an 
unknown type. 


1 





1 (index) 


All the referenced subsegments start with a 
SAP which may be of an unknown type 


1 


lto6, 
inclusive 


(media) 


The referenced subsegment starts with a SAP 
of the given SAP_type. 


1 


lto6, 
inclusive 


1 (index) 


All the referenced subsegments start with a 
SAP of at most the given SAP_type and 
none of these SAPs is of an unknown type. 



13.5 Track Fragment Decode Time Box 

The Track Fragment Base Media Decode Time ("tf dt") Box provides the decode time of the first sample in the track 
fragment. This can be useful, for example, when performing random access in a file; it is not necessary to sum the 
sample durations of all preceding samples in previous fragments to find this value (where the sample durations are the 
deltas in the Decoding Time to Sample Box and the sample_durat ions in the preceding track runs). 

The Track Fragment Base Media Decode Time Box, if present, shall be positioned after the Track Fragment Header 
Box and before the first Track Fragment Run box. 

Note: the decode timeline is a media timeline, established before any explicit or implied mapping of media time to 
presentation time, for example by an edit list or similar structure. 

aligned (8) class TrackFragmentBaseMediaDecodeTimeBox 
extends FullBox ( "tfdt " , version, 0) { 
if (version==l) { 
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unsigned int(64) baseMediaDecodeTime; 
else { // version==0 

unsigned int(32) baseMediaDecodeTime; 



version is an integer that specifies the version of this box (0 or 1 in this specification). 

baseMediaDecodeTime is an integer equal to the sum of the decode durations of all earlier samples in the media, 
expressed in the media's timescale. It does not include the samples added in the enclosing track fragment. 
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Annex A (normative): 

MIME Type Registrations for 3GP files 

A.1 MIME Types 
A.1.1 General 

This registration is an update and replacement of RFC 3839. It applies to all files defined as using the '3GP' file format 
and identified with a suitable brand in a 3GPP specification. The usual file suffix for all these files is ".3gp". The 
difference between the current registration and RFC 3839 is the inclusion of two optional parameters. 

A.1 .2 Files with audio but no visual content 

The type " audio/3 gpp" may be used for files containing audio but no visual presentation (neither video nor timed text, 
for example). 

Type name: audio 

Subtype name: 3gpp 

Required parameters: none 

Optional parameters: 

codecs: is a single value or a comma-separated list that identifies the codec(s) needed for rendering the 

content contained (in tracks) of a file. The codecs parameter is defined in RFC 6381 [32]. The ISO 
file format name space and ISO syntax in clauses 3.3 and 3.4 of RFC 6381, respectively, shall be 
used together with additions defined in clause A.2.2 of the present document. 

types: is a single value or a comma-separated list that identifies the MIME media types of the content 

contained (in items) in a file. It is defined in clause A.2.3 of the present document. 

Encoding considerations: files are binary and should be transmitted in a suitable encoding without CR/LF conversion, 
7-bit stripping etc.; base64 (RFC 4648 [35]) is a suitable encoding. 

Security considerations: see the security considerations section in A. 3 of the present document. 

Interoperability considerations: The 3GPP organization has defined the specification, interoperability, and conformance. 
IMTC conducts interoperability testing. 

Published specification: 3GPP TS 26.234, Release 5; 3GPP TS 26.244, Release 6 or later. 3GPP specifications are 
publicly accessible at the 3GPP web site, www.3gpp.org. 

Applications which use this media type: Multi-media 

Additional information: The type "audio/3 gpp" may be used for files containing audio but no visual presentation. Files 
served under this type must not contain any visual material. (Note that timed text is visually 
presented and is considered to be visual material). 

Magic number(s): None. However, the file-type box must occur first in the file, and must contain a 3GPP brand in its 
compatible brands list. 

File extension(s): '3gp' and '3gpp' are both declared at http://www.nist.gov/nics/; 3gp is preferred 

Macintosh File Type Code(s): '3gpp' 

Intended usage: COMMON 
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Restrictions on usage: Note that this MIME type is used only for files; separate types are used for real-time transfer, 
such as for the RTP payload format for AMR audio (RFC 4867 [15]). 

Author: 

3GPP TSG SA WG4 

Change controller: 3GPP TSG SA 

A.1.3 Any files 

The type "video/3gpp" is valid for all files. It is valid to serve an audio-only file as "video/3gpp". 

MIME media type name: video 
MIME subtype name: 3gpp 

Required parameters: none 

Optional parameters: 

codecs: is a single value or a comma-separated list that identifies the codec(s) needed for rendering the 

content contained (in tracks) of a file. The codecs parameter is defined in RFC 6381 [32]. The ISO 
file format name space and ISO syntax in clauses 3.3 and 3.4 of RFC 6381, respectively, shall be 
used together with additions defined in clause A.2.2 of the present document. 

types: is a single value or a comma-separated list that identifies the MIME media types of the content 

contained (in items) in a file. It is defined in clause A.2.3 of the present document. 

Encoding considerations: files are binary and should be transmitted in a suitable encoding without CR/LF conversion, 
7-bit stripping etc.; base64 (RFC 4648 [35]) is a suitable encoding. 

Security considerations: see the security considerations section in A. 3 of the present document. 

Interoperability considerations: The 3GPP organization has defined the specification, interoperability, and conformance. 
IMTC conducts interoperability testing. 

Published specification: 3GPP TS 26.234, Release 5; 3GPP TS 26.244, Release 6 or later. 3GPP specifications are 
publicly accessible at the 3GPP web site, www.3gpp.org. 

Applications which use this media type: Multi-media 

Additional information: 

Magic number(s): None. However, the file-type box must occur first in the file, and must contain a 3GPP brand in its 
compatible brands list. 

File extension(s): '3gp' and '3gpp' are both declared at http://www.nist.gov/nics/; 3gp is preferred 

Macintosh File Type Code(s): '3gpp' 

Intended usage: COMMON 

Restrictions on usage: Note that this MIME type is used only for files; separate types are used for real-time transfer, 
such as for the RTP payload format for AMR audio (RFC 4867 [15]). 

Author: 

3GPP TSG SA WG4 

Change controller: 3GPP TSG SA 

A.1 .4 video/vnd.3gpp. segment 

Type name: video 
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Subtype name: vnd.Sgpp. segment 
Required parameters: 

None. 
Optional parameters: 

None. 

Encoding considerations: 

Files are binary and should be transmitted in a suitable encoding without CR/LF conversion, 7-bit stripping etc.; 
base64 (RFC 4648 [35]) is a suitable encoding. 

Security considerations: 

See the security considerations section in A.3 of the present document. 
Interoperability considerations: 

None. 
Published specification: 

3GPP TS 26.244, Release 9. 
Applications which use this media type: 

Third Generation Partnership Project (3 GPP) Adaptive HTTP Streaming. 
Additional information: 
Magic number(s): 

None 
File extension(s): 

3gs 
Person & email address to contact for further information: 

John Meredith (john.meredith@etsi.org) 
Intended usage: 

Common 
Restrictions on usage: 
Author: 

3GPP TSG SA WG4 
Change controller: 

3GPP TSG SA 
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A.2 Optional parameters 
A.2.1 General 

Two optional parameters are defined here for the "audio/3gpp" and "video/3gpp" media types. Additional parameters 
may be specified by updating the media type registrations. Any unknown parameter shall be ignored. 

A.2. 2 Codecs parameter 

The codecs parameter is defined in RFC 6381. The ISO file format name space and ISO syntax in clauses 3.3 and 3.4 of 
RFC 6381 [32] shall be used together with extensions to the ISO syntax specified here. 

The syntax in clause 3.4 of RFC 6381 defines the usage of the codecs parameter for files based on the ISO base media 
file format and specifies that the first element of a parameter value is a sample description entry four-character code. It 
also includes specific definitions for MPEG audio ('mp4a') and MPEG video ('mp4v') where each value in addition to 
the four-character code includes two elements signalling Object Type Indications and Profile Level Indications (video 
only). It also includes specific definitions for Advanced Video Coding ("avcl") where each value in addition to the 
four-character code includes a second element (referred to as "avcoti" in the formal syntax), which is the hexadecimal 
representation of the following three bytes in the (subset) sequence parameter set Network Abstraction Layer (NAL) 
unit specified in [29]: (1) profile_idc,(2) the byte containing the constraint_set flags (currently constraint_setO_flag 
through constraint_set5_flag, and the reserved_zero_2bits), and (3) level_idc. Note also that reserved_zero_2bits is 
required to be equal to in [29], but other values for it may be specified in the future by ITU-T or ISO/IEC. These 
definitions apply to the MPEG codecs used by the 3GP file format, such as H.264 (AVC) [29], MPEG-4 Visual [10], 
MPEG-4 AAC [13] and Enhanced aacPlus [23, 24, 25]. Values for other codecs used by the 3GP file format are 
specified below. 

When the first element of a value is 's263', indicating H.263 video [9], the second element is the decimal representation 
of the profile, e.g., or 3, and the third element is the decimal representation of the level, e.g. 10 or 45. 

When the first element of a value is one of the following elements, no other elements are defined for that value: 

'samr', indicating AMR narrow-band speech [11]; 

'sawb', indicating AMR wide-band speech [12]; 

- 'sawp', indicating Extended AMR wide-band audio [21]; 

- 'tx3g', indicating timed text [4]. 

The following syntax defines all values above in ABNF (RFC 4234 [31]) by extending the definition in clause 3.4 of 
RFC 6381: 

id-iso = iso-gen / iso-mpega / iso-mpegv / iso-amr / iso-amr-wb / iso-amr-wbp / iso-tt / iso-h263 

; iso-gen, iso-mepga, iso-mpegv, iso-avc as defined in RFC 6381 

iso-amr = %x73.61.6d.72 ; 'samr' 

iso-amr-wb = %x73.61.77.62 ; 'sawb' 

iso-amr-wbp = %x73.61.6d.74 ; 'sawp' 

iso-tt = %x74.78.33.67 ; 'tx3g' 

iso-h263 = s263"."h263-profile"."h263-level 

s263 = %x73.32.36.33 ; 's263' 

h263-profile = 1*DIGIT 

h263-level = 1*DIGIT 



ETSI 



3GPP TS 26.244 version 11.1.0 Release 11 59 ETSI TS 126 244 V1 1.1.0 (2012-10) 



A.2.3 Types parameter 



The types parameter is a single value or a comma- separated list that identifies the MIME media types of the content 
contained (in items) of a 3GP file. Each value consists of a type-subtype pair and corresponds to a value of the 
content_type field provided for an item in the item information box. 

If the types parameter is present, then it shall include all MIME types needed for rendering the content contained (in 
items) of a file. 

The types parameter is defined in ABNF (RFC 5234 [44]) below: 

types = "types" "=" type-list 

type-entry = type-name "/" subtype-name *( *WSP";" *WSP parameter ) 

parameter = attribute *WSP "=" *WSP value 

attribute = token 

value = token / quoted-string 

token = 1 *(%x21 / %x23-27 / %x2A-2B / %x2D-2E / %x30-39 

/ %x41-5A/%x5E-7E) 
; l*<any CHAR except SP, CTLs or tspecials> 

type-list = DQUOTE type-entry *( "," type-entry ) DQUOTE 

'type-name' and 'subtype-name' are defined in RFC4288[42]. 
"tspecials" is defined in RFC2045[45] 
'quoted-string' is defined in RFC5322[43]. 
"CHAR", "CTL", "SP", 'WSP' and 'DQUOTE' are defined in RFC 5234 [31]. 

NOTE: any <"> character in "type-entry" needs to be escaped with "\". This is not shown in the above grammar. 



A.3 Security considerations 



The 3GPP file format may contain audio, video, displayable text data, images, graphics, scene descriptions, etc. Clearly 
it is possible to author malicious files which attempt to call for an excessively large picture size, high sampling-rate 
audio etc. However, clients can and usually do protect themselves against this kind of attack. It should be noted that 
selected metadata fields may encompass information partly intended to protect the media against unauthorized use or 
distribution. In this case, the intention is that alteration or removal of the data in the field would be treated as an offense 
under national agreements based on World Intellectual Property Organization (WIPO) treaties. 

There is no current provision in the standards for signing or authentication of these file formats. 
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Annex B (informative): 
Change history 



Change history 


Date 


TSG# 


TSG Doc. 


CR 


Rev 


Subject/Comment 


Old 


New 


2004-03 


23 


SP-040065 






Approved at TSG#23 




6.0.0 


2004-09 


25 


SP-040643 


002 


1 


Storage of AMR-WB+ audio in 3GP files 


6.0.0 


6.1.0 


2004-09 


25 


SP-040654 


003 




Additional Release 6 update to 3GP file format 


6.0.0 


6.1.0 


2004-09 


25 


SP-040657 


004 


1 


Storage of H.264 (AVC) video in 3GP files 


6.0.0 


6.1.0 


2004-09 


25 


SP-040643 


005 


1 


Storage of Enhanced aacPlus audio in 3GP files 


6.0.0 


6.1.0 


2004-12 


26 


SP-040839 


006 


1 


Correction of syntax of encryption boxes and outdated 
references 


6.1.0 


6.2.0 


2004-12 


26 


SP-040839 


007 




Correction of sample structure for AMR-WB+ in 3GP 
files 


6.1.0 


6.2.0 


2005-03 


27 


SP-050094 


008 


1 


Extended presentations in 3GP files for MBMS 


6.2.0 


6.3.0 


2005-09 


29 


SP-050427 


0009 


1 


New UDTA sub-box "albm" - album for the media 


6.3.0 


6.4.0 


2005-09 


29 


SP-050427 


0010 




Correction of SDP bandwidth modifiers 


6.3.0 


6.4.0 


2005-09 


29 


SP-050427 


0011 




Correction regarding sample groups in 3GP file format 


6.3.0 


6.4.0 


2006-06 


32 


SP-060355 
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